unix 中用于减去文本文件的工具？

Question

unix 中用于减去文本文件的工具？

我有一个大文件，由大表格形式的分号分隔的文本字段组成。它已经排序。我有一个由相同文本字段组成的较小文件。在某些时候，有人将此文件与其他文件连接起来，然后进行排序以形成上述大文件。我想从大文件中减去小文件的行（即对于小文件中的每一行，如果大文件中存在匹配的字符串，则删除大文件中的该行）。

该文件大致如下

GenericClass1; 1; 2; NA; 3; 4;
GenericClass1; 5; 6; NA; 7; 8;
GenericClass2; 1; 5; NA; 3; 8;
GenericClass2; 2; 6; NA; 4; 1;

Run Code Online (Sandbox Code Playgroud)

等等

有没有一种快速优雅的方法来做到这一点，还是我必须使用 awk？

Answer 1

ter*_*don 34

您可以使用grep. 将小文件作为输入并告诉它查找不匹配的行：

grep -vxFf file.txt bigfile.txt > newbigfile.txt

Run Code Online (Sandbox Code Playgroud)

使用的选项是：

   -F, --fixed-strings
          Interpret PATTERN as a  list  of  fixed  strings,  separated  by
          newlines,  any  of  which is to be matched.  (-F is specified by
          POSIX.)
   -f FILE, --file=FILE
          Obtain  patterns  from  FILE,  one  per  line.   The  empty file
          contains zero patterns, and therefore matches nothing.   (-f  is
          specified by POSIX.)

   -v, --invert-match
          Invert the sense of matching, to select non-matching lines.  (-v
          is specified by POSIX.)
   -x, --line-regexp
          Select only those matches that exactly match the whole line.  
          (-x is specified by POSIX.)

Run Code Online (Sandbox Code Playgroud)

Answer 2

Ulr*_*arz 20

comm 是你的朋友：

NAME comm - 逐行比较两个排序的文件

概要通信 [选项]... FILE1 FILE2

说明逐行比较已排序的文件 FILE1 和 FILE2。
   With  no  options, produce three-column output.  Column one contains lines unique to FILE1, column two contains
   lines unique to FILE2, and column three contains lines common to both files.

   -1     suppress column 1 (lines unique to FILE1)

   -2     suppress column 2 (lines unique to FILE2)

   -3     suppress column 3 (lines that appear in both files)
Run Code Online (Sandbox Code Playgroud)

（因为它考虑了排序性comm，grep所以可能会带来性能优势。）

例如：

comm -1 -3 file.txt bigfile.txt > newbigfile.txt

Run Code Online (Sandbox Code Playgroud)

关于对排序列表使用 comm 而非 grep 的好处。如果您给出了一个特定的命令行示例，例如 `comm -1 -3 file.txt bigfile.txt > newbigfile.txt`，这将是一个更好的答案 (2认同)

归档时间：	11 年，7 月前
查看次数：	12212 次
最近记录：	10 年，2 月前