文件的交叉点

Question

文件的交叉点

bde*_*vic 6 unix algorithm intersection file

我有两个大文件(27k行和450k行).他们看起来像:

File1:
1 2 A 5
3 2 B 7
6 3 C 8
...

File2:
4 2 C 5
7 2 B 7
6 8 B 8
7 7 F 9
...

Run Code Online (Sandbox Code Playgroud)

我想要两个文件中的第三列都在两个文件中的行(排除了带有A和F的注释行):

OUTPUT:
3 2 B 7
6 3 C 8
4 2 C 5
7 2 B 7
6 8 B 8

Run Code Online (Sandbox Code Playgroud)

什么是最好的方式？

Answer 1

Kei*_*all 2

awk '{print $3}' file1 | sort | uniq > file1col3
awk '{print $3}' file2 | sort | uniq > file2col3
grep -Fx -f file1col3 file2col3 | awk '{print "\\w+ \\w+ " $1 " \\w+"}' > col3regexp
egrep -xh -f col3regexp file1 file2

Run Code Online (Sandbox Code Playgroud)

获取两个文件中所有唯一的第 3 列，将它们相交（使用grep -F），打印一堆与所需列匹配的正则表达式，然后使用egrep它从两个文件中提取它们。

归档时间：	13 年，4 月前
查看次数：	1138 次
最近记录：	13 年，4 月前