如何删除一个文件中存在于另一个文件中的行？

Question

我每天都会收到一个文件，其中有 10,000 条记录，其中 99% 都在最后一天的文件中。如何使用 macOS 命令行删除新文件中前一天文件中存在的行？

remove_duplicates newfile oldfile

这些文件看起来像这样：

"First Last"\t"email"\t"phone"\t"9 more columns..."

注意，我尝试了这个awk解决方案，但它没有输出任何内容，即使我确认了重复的行。

Answer 1

您可能可以grep与-v(invert-match) 和-f(file) 选项一起使用：

grep -v -f oldfile newfile > newstrip

它匹配newfile中不在oldfile中的任何行，并将它们保存到newstrip中。如果您对结果感到满意，您可以轻松执行以下操作：

mv newstrip newfile

这将用newstrip覆盖newfile（删除newstrip）。