使用 sed -e 解析日志文件。需要计算唯一的类名

Question

使用 sed -e 解析日志文件。需要计算唯一的类名

wir*_*iko 6 sed text-processing regular-expression

我有一个文件，我们称之为filename.log，在里面我有这样的东西

(2014-11-18 14:09:21,766), , xxxxxx.local, EventSystem, DEBUG FtpsFile delay secs is 5 [pool-3-thread-7] 
(2014-11-18 14:09:21,781), , xxxxxx.local, EventSystem, DEBUG FtpsFile disconnected from ftp server [pool-3-thread-7] 
(2014-11-18 14:09:21,798), , xxxxxx.local, EventSystem, DEBUG FtpsFile FTP File  Process@serverStatus on exit  - 113 [pool-3-thread-7] 
(2014-11-18 14:09:21,798), , xxxxxx.local, EventSystem, DEBUG FtpsFile FTP File  Process@serverStatus on exit  - 114 [pool-3-thread-7] 
(2014-11-18 14:09:21,799), , xxxxxx.local, EventSystem, DEBUG JobQueue $_Runnable Finally of consume() :: [pool-3-thread-7]

Run Code Online (Sandbox Code Playgroud)

我试图找到产生最频繁调试消息的类。

在此示例中，您可以看到FtpsFile和JobQueue是生成消息的两个类。

我有这个

cat filename.log | sed -n -e 's/^.*\(DEBUG \)/\1/p' | sort | uniq -c | sort -rn | head -10

Run Code Online (Sandbox Code Playgroud)

这将生成班级名称并显示最常用的班级作为前 10 名。

问题是这并没有给我FtpsFile类的计数为 4。它将每个 FtpsFile 日志文件计数为一个不同的唯一实体。

我如何更改上面的命令以基本上说在 DEBUG 之后抓取第一个单词并忽略其余的计数？

理想情况下我应该得到 4 FtpsFile 1 JobQueue

Answer 1

jim*_*mij 2

使用 GNU sed：

sed 's/.*DEBUG \(\w*\).*/\1/' | uniq -c
      4 FtpsFile
      1 JobQueue

Run Code Online (Sandbox Code Playgroud)

和grep：

grep -Po 'DEBUG \K\w+' | uniq -c
      4 FtpsFile
      1 JobQueue

Run Code Online (Sandbox Code Playgroud)

和awk：

awk '$6=="DEBUG"{print $7}' | uniq -c
      4 FtpsFile
      1 JobQueue

Run Code Online (Sandbox Code Playgroud)

最后一个可以用 pure 来完成awk，但为了相似，我将其通过管道传输到uniq.

归档时间：	10 年，10 月前
查看次数：	3661 次
最近记录：	9 年，9 月前