Xbr*_*izh 0 bash awk grouping grep count
我想分析一个日志文件。它有几个操作,每个操作包含一组子操作。我想提取按操作分组的子操作数。这在 sql 中很容易,但我在 bash 中陷入困境。
这是该文件的简化版本:
[21:30:21.538Z #a9a.012 DEBUG - - ] c.h.c.w.j.JobTrackingWorkerReporter: Reporting bulk completion: Partition: tenant-xla; Job: ingestion-4759-9-13-41; Tasks: [ingestion-4759-9-13-41.1.43, ingestion-4759-9-13-41.1.44, ingestion-4759-9-13-41.1.41]
otherlogs stuff ...
[21:31:21.538Z #a9a.012 DEBUG - - ] c.h.c.w.j.JobTrackingWorkerReporter: Reporting bulk completion: Partition: tenant-xla; Job: ingestion-4757-10-17-4; Tasks: [ingestion-4757-10-17-4.1.2, ingestion-4757-10-17-4.1.1, ingestion-4757-10-17-4.1.3, ingestion-4757-10-17-4.1.4]
otherlogs stuff ...
[21:31:21.690Z #a9a.012 DEBUG - - ] c.h.c.w.j.JobTrackingWorkerReporter: Reporting bulk completion: Partition: tenant-xla; Job: ingestion-4757-10-18-3; Tasks: [ingestion-4757-10-18-3.1.137, ingestion-4757-10-18-3.1.139, ingestion-4757-10-18-3.1.138, ingestion-4757-10-18-3.1.140, ingestion-4757-10-18-3.1.136, ingestion-4757-10-18-3.1.141]
Run Code Online (Sandbox Code Playgroud)
每个操作都是点之前的部分,其余部分属于任何子操作。
我正在寻找类似以下的结果,例如,我可以将其存储在文件中:
operationName suboperationCount
ingestion-4757-10-18-3 3
ingestion-4757-10-18-4 4
ingestion-4757-10-18-3 6
Run Code Online (Sandbox Code Playgroud)
我一直在尝试几种组合,例如 cat xlogs.txt | grep 'ingestion' | uniq | wc -w > fileresult.txt
但这只会返回全球数字。
谢谢!
编辑:在 OP 的评论知道我们只需要在其中包含 id 之后TASKS
,在这种情况下,您可以尝试遵循,严格考虑到您TASK
的 Input_file 每行中只有 1 个字符串。
awk '
{
sub(/.*Tasks/,"Tasks")
while(match($0,/ingestion-[0-9-]+/)){
arr[substr($0,RSTART,RLENGTH)]++
$0=substr($0,RSTART+RLENGTH)
}
}
END{
for(i in arr){
print i,arr[i]
}
}' Input_file
Run Code Online (Sandbox Code Playgroud)
随着awk
能不能请你follwing,书面与显示样品进行测试。
awk '
{
sub(/.*Tasks/,"Tasks")
while(match($0,/ingestion-[0-9-]+/)){
arr[substr($0,RSTART,RLENGTH)]++
$0=substr($0,RSTART+RLENGTH)
}
}
END{
for(i in arr){
print i,arr[i]
}
}' Input_file
Run Code Online (Sandbox Code Playgroud)
说明:为以上添加详细说明。
awk ' ##Starting awk program from here.
{
while(match($0,/ingestion-[0-9-]+/)){ ##Running while loop till match function returns a TRUE result after matching regex init.
arr[substr($0,RSTART,RLENGTH)]++ ##Creating array arr whihc has index as matched regex substring and keep increasing its value by 1 here.
$0=substr($0,RSTART+RLENGTH) ##Now saving rest of the line(after the matched regx above) into current line.
}
}
END{ ##Starting END block of this awk program from here.
for(i in arr){ ##Traversing through arr all elements here.
print i,arr[i] ##printing index of array and value of array with index of i.
}
}' Input_file ##mentioning Input_file name here.
Run Code Online (Sandbox Code Playgroud)