如何在日志中分组并计算bash中的每个子组

Xbr*_*izh 0 bash awk grouping grep count

我想分析一个日志文件。它有几个操作,每个操作包含一组子操作。我想提取按操作分组的子操作数。这在 sql 中很容易,但我在 bash 中陷入困境。

这是该文件的简化版本:

    [21:30:21.538Z #a9a.012 DEBUG -            -   ] c.h.c.w.j.JobTrackingWorkerReporter: Reporting bulk completion: Partition: tenant-xla; Job: ingestion-4759-9-13-41; Tasks: [ingestion-4759-9-13-41.1.43, ingestion-4759-9-13-41.1.44, ingestion-4759-9-13-41.1.41]

otherlogs stuff ...

[21:31:21.538Z #a9a.012 DEBUG -            -   ] c.h.c.w.j.JobTrackingWorkerReporter: Reporting bulk completion: Partition: tenant-xla; Job: ingestion-4757-10-17-4; Tasks: [ingestion-4757-10-17-4.1.2, ingestion-4757-10-17-4.1.1, ingestion-4757-10-17-4.1.3, ingestion-4757-10-17-4.1.4]

otherlogs stuff ...

[21:31:21.690Z #a9a.012 DEBUG -            -   ] c.h.c.w.j.JobTrackingWorkerReporter: Reporting bulk completion: Partition: tenant-xla; Job: ingestion-4757-10-18-3; Tasks: [ingestion-4757-10-18-3.1.137, ingestion-4757-10-18-3.1.139, ingestion-4757-10-18-3.1.138, ingestion-4757-10-18-3.1.140, ingestion-4757-10-18-3.1.136, ingestion-4757-10-18-3.1.141]
Run Code Online (Sandbox Code Playgroud)

每个操作都是点之前的部分,其余部分属于任何子操作。

我正在寻找类似以下的结果,例如,我可以将其存储在文件中:

operationName            suboperationCount
ingestion-4757-10-18-3         3
ingestion-4757-10-18-4         4
ingestion-4757-10-18-3         6
Run Code Online (Sandbox Code Playgroud)

我一直在尝试几种组合,例如 cat xlogs.txt | grep 'ingestion' | uniq | wc -w > fileresult.txt

但这只会返回全球数字。

谢谢!

Rav*_*h13 5

编辑:在 OP 的评论知道我们只需要在其中包含 id 之后TASKS,在这种情况下,您可以尝试遵循,严格考虑到您TASK的 Input_file 每行中只有 1 个字符串。

awk '
{
  sub(/.*Tasks/,"Tasks")
  while(match($0,/ingestion-[0-9-]+/)){
    arr[substr($0,RSTART,RLENGTH)]++
    $0=substr($0,RSTART+RLENGTH)
  }
}
END{
  for(i in arr){
    print i,arr[i]
  }
}'  Input_file
Run Code Online (Sandbox Code Playgroud)

随着awk能不能请你follwing,书面与显示样品进行测试。

awk '
{
  sub(/.*Tasks/,"Tasks")
  while(match($0,/ingestion-[0-9-]+/)){
    arr[substr($0,RSTART,RLENGTH)]++
    $0=substr($0,RSTART+RLENGTH)
  }
}
END{
  for(i in arr){
    print i,arr[i]
  }
}'  Input_file
Run Code Online (Sandbox Code Playgroud)

说明:为以上添加详细说明。

awk '                                       ##Starting awk program from here.
{
  while(match($0,/ingestion-[0-9-]+/)){     ##Running while loop till match function returns a TRUE result after matching regex init.
    arr[substr($0,RSTART,RLENGTH)]++        ##Creating array arr whihc has index as matched regex substring and keep increasing its value by 1 here.
    $0=substr($0,RSTART+RLENGTH)            ##Now saving rest of the line(after the matched regx above) into current line.
  }
}
END{                                        ##Starting END block of this awk program from here.
  for(i in arr){                            ##Traversing through arr all elements here.
    print i,arr[i]                          ##printing index of array and value of array with index of i.
  }
}' Input_file                               ##mentioning Input_file name here.
Run Code Online (Sandbox Code Playgroud)