比较linux中的两个未排序列表,列出第二个文件中的唯一列表

Question

比较linux中的两个未排序列表,列出第二个文件中的唯一列表

mvr*_*sen 34 linux bash shell comparison grep

我有2个文件,其中包含一个数字列表(电话号码)

我正在寻找一种在第二个文件中列出第一个文件中不存在的数字的方法

我尝试了各种方法:

comm (getting some weird sorting errors)
fgrep -v -x -f second-file.txt first-file.txt (unsure of the result, there should be more)

Run Code Online (Sandbox Code Playgroud)

谢谢

Answer 1

Har*_*non 63

grep -Fxv -f first-file.txt second-file.txt

Run Code Online (Sandbox Code Playgroud)

基本上查找所有second-file.txt与任何行都不匹配的行first-file.txt.如果文件很大,可能会很慢.

此外,一旦您对文件进行排序(sort -n如果它们是数字使用),那么comm也应该有效.它给出了什么错误？试试这个:

comm -23 second-file-sorted.txt first-file-sorted.txt

Run Code Online (Sandbox Code Playgroud)

Answer 2

rus*_*ush 24

你需要使用comm:

comm -13 first.txt second.txt

Run Code Online (Sandbox Code Playgroud)

会做的.

PS.命令行中第一个和第二个文件的顺序很重要.

您可能还需要在以下之前对文件进行排序

comm -13 <(sort first.txt) <(sort second.txt)

Run Code Online (Sandbox Code Playgroud)

如果文件是数字添加-n选项sort.

请记住,以数字方式对文件进行排序可能不起作用,因为comm期望按字典顺序对它们进行排序. (2认同)

Answer 3

Nah*_*eul 7

这应该工作

comm -13 <(sort file1) <(sort file2)

Run Code Online (Sandbox Code Playgroud)

似乎sort -n(数字)不能与comm一起使用,它在内部使用sort(字母数字)

f1.txt

Run Code Online (Sandbox Code Playgroud)

f2.txt

Run Code Online (Sandbox Code Playgroud)

21应该出现在第三列

#WRONG
$ comm <(sort -n f1.txt) <(sort -n f2.txt)   
                1
2
21
        3
        21
                50

#OK
$ comm <(sort f1.txt) <(sort f2.txt)
                1
2
                21
        3
                50

Run Code Online (Sandbox Code Playgroud)

归档时间：	14 年前
查看次数：	66350 次
最近记录：	7 年前