awk + 计算文件中的字符串

Question

awk + 计算文件中的字符串

yae*_*ael 0 linux sed awk shell-script text-processing

我们有这样的大文件

这是文件中的部分列表

Topic: Ho_HTR_bvt     Partition: 31   Leader: 1007    Replicas: 1007,1008,1009        Isr: 1009,1007,1008
Topic: Ho_HTR_bvt     Partition: 32   Leader: 1008    Replicas: 1008,1009,1010        Isr: 1010,1009,1008
Topic: Ho_HTR_bvt     Partition: 33   Leader: 1009    Replicas: 1009,1010,1006        Isr: 1009,1010,1006
Topic: Ho_HTR_bvt     Partition: 34   Leader: 1010    Replicas: 1010,1006,1007        Isr: 1006,1007,1010
Topic: Ho_HTR_bvt     Partition: 35   Leader: 1006    Replicas: 1006,1008,1009        Isr: 1006,1009,1008
Topic: Ho_HTR_bvt     Partition: 36   Leader: 1007    Replicas: 1007,1009,1010        Isr: 1010,1007,1009
Topic: Ho_HTR_bvt     Partition: 37   Leader: 1008    Replicas: 1008,1010,1006        Isr: 1006,1010,1008
Topic: Ho_HTR_bvt     Partition: 38   Leader: 1009    Replicas: 1009,1006,1007        Isr: 1007,1009,1006
Topic: Ho_HTR_bvt     Partition: 39   Leader: 1010    Replicas: 1010,1007,1008        Isr: 1010,1007,1008
Topic: Ho_HTR_bvt     Partition: 40   Leader: 1006    Replicas: 1006,1009,1010        Isr: 1006,1010,1009
Topic: Ho_HTR_bvt     Partition: 41   Leader: 1007    Replicas: 1007,1010,1006        Isr: 1006,1007,1010
Topic: Ho_HTR_bvt     Partition: 42   Leader: 1008    Replicas: 1008,1006,1007        Isr: 1006,1007,1008
Topic: Ho_HTR_bvt     Partition: 43   Leader: 1009    Replicas: 1009,1007,1008        Isr: 1009,1007,1008
Topic: Ho_HTR_bvt     Partition: 44   Leader: 1010    Replicas: 1010,1008,1009        Isr: 1010,1009,1008

Run Code Online (Sandbox Code Playgroud)

如何计算数字 - 1007字符串？

或文件中的任何其他单词

Answer 1

Kus*_*nda 5

使用您的示例数据：

$ grep -Fo 1007 file | wc -l
      19

Run Code Online (Sandbox Code Playgroud)

grep此管道的一部分将搜索字符串1007（使用该-F标志是因为我们正在进行字符串比较，而不是正则表达式匹配）。由于-o标志，它将在新行上返回字符串的每个单独实例。返回的行数按计数wc -l。

如果字符串在输入数据的一行上出现两次，这将计算两次。如果字符串作为另一个单词的子字符串出现，它也会被计算在内。

与awk：

$ awk -v str="1007" '{ c += gsub(str, str) } END { print c }' file
19

Run Code Online (Sandbox Code Playgroud)

这会计算字符串出现的次数gsub()（此函数返回执行替换的次数，我们将其单独应用于每个输入行）并在最后打印总计数。我们感兴趣的字符串通过命令行传递-v str="1007"。

归档时间：	6 年，9 月前
查看次数：	285 次
最近记录：	6 年，9 月前

awk + ​​计算文件中的字符串

awk + 计算文件中的字符串