要从文件中删除的模式列表

Question

为了进一步说明，我们有两个文件内容：

文件 1

hello
1_hello 
2_hello
world
1_world
2_world
hello1
1_hello1
2_hello1
world1
1_world1
2_world1

文件 2

This
hello
1_hello
2_hello
is world
1_world
2_world
my
hello1
1_hello1
2_hello1
word
world1
1_world1
2_world1
file

我想要的是迭代 file1 的第一列并删除 file2 中匹配的条目并产生如下输出：

This
is
my 
word
file

我该如何继续？

Answer 1

您想使用 awk 读取 file1 并记住它的所有单词。然后读取 file2 并输出从 file1 中看不到的任何单词：

gawk -v RS='[[:space:]]+' 'NR==FNR {words[$1]=1; next} !($1 in words)' file1 file2

它使用任何空格序列作为记录分隔符，因此每个单词都被视为单独的“行”。这现在是 GNU awk 特定的，但这是 Ubuntu 上的默认 awk