我想找到两个文件(大文件)之间的公共线,一个有9000万行,1个有100万个,还有它们的行号.
comm -12 file1 file2
Run Code Online (Sandbox Code Playgroud)
给了我公共线,但我想知道各个文件中的行号
你可以试试:
awk '
FNR==NR {
a[$0]++
next
}
$0 in a {
print
delete a[$0]
}' file1 file2
Run Code Online (Sandbox Code Playgroud)
如果您还想获取行号,可以在 gawk 版本 4 中使用数组的数组,如下所示:
FNR==NR {
a[$0][FNR]++
file1=FILENAME
next
}
FNR==1 {
file2=FILENAME
}
$0 in a {
b[$0][FNR]++
}
END {
for(i in b) {
print "Line: " i
print " Line numbers in "file1":"
printf " "
for (j in a[i])
printf "%s,", j
print ""
print " Line numbers in "file2":"
printf " "
for (j in b[i])
printf "%s,", j
print ""
}
}
Run Code Online (Sandbox Code Playgroud)