我想找到两个文件(大文件)之间的公共线,一个有9000万行,1个有100万个,还有它们的行号.
comm -12 file1 file2
给了我公共线,但我想知道各个文件中的行号
你可以试试:
awk '
FNR==NR {
    a[$0]++
    next
}
$0 in a {
    print
    delete a[$0]
}' file1 file2
如果您还想获取行号,可以在 gawk 版本 4 中使用数组的数组,如下所示:
FNR==NR {
    a[$0][FNR]++
    file1=FILENAME
    next
}
FNR==1 {
    file2=FILENAME
}
$0 in a {
    b[$0][FNR]++
}
END {
    for(i in b) {
        print "Line: " i
        print " Line numbers in "file1":"
        printf "  "
        for (j in a[i])
            printf "%s,", j
        print ""
        print " Line numbers in "file2":"
        printf "  "
        for (j in b[i])
            printf "%s,", j
        print ""
    }
}