从 Linux 中的文件中获取最常见的行

Question

从 Linux 中的文件中获取最常见的行

Jim*_*Jim 13 linux unix grep command-line ubuntu

我有一个文本文件，每行都有不同的单词。
如何找到文件中出现频率最高的 12 行并显示它们？
我不太擅长脚本命令。

如果我能得到命令和解释，以便我能够理解如何使用它并扩展我对命令的知识，那就太好了！

Answer 1

slh*_*hck 23

您可以使用内置命令轻松完成此操作。

馈送文件的内容sort。我们下一步需要这个。
这去uniq -c. 它将计算每行的唯一出现次数。如果相似的线不相邻，那么如果不进行排序就不会起作用。
然后，将其提供给另一个sort，它现在以相反的顺序 ( r) 并基于输出的数字 ( n) 解释进行排序uniq。我们需要数字选项，否则数字前面的空格会导致错误的结果（更多信息请参见GNUsort的帮助）。
最后，只显示前十二行head。

该命令将是：

sort test.txt | uniq -c | sort -rn | head -n 12

Run Code Online (Sandbox Code Playgroud)

此处的输出包含实际出现次数。

要仅获取行的原始列表，您可以将输出通过管道传输到sed：

sort test.txt | uniq -c | sort -rn | head -n 12 | sed -E 's/^ *[0-9]+ //g'

Run Code Online (Sandbox Code Playgroud)

例子：

I'm not there very often
I'm not there very often
Look at me!
Look at me!
Look at me!
Hello there!
Hello there!
Hello there!
Hello there!
Hello there!
Hello there!

Run Code Online (Sandbox Code Playgroud)

第一个命令的输出，但仅从以下命令中选择 2 个head：

6 Hello there!
3 Look at me!

Run Code Online (Sandbox Code Playgroud)

第二个命令的输出：

Hello there!
Look at me!

Run Code Online (Sandbox Code Playgroud)

归档时间：	13 年，9 月前
查看次数：	12816 次
最近记录：	13 年，5 月前