Esc*_*her 18 diff text-processing files
我有一个大文件,由大表格形式的分号分隔的文本字段组成。它已经排序。我有一个由相同文本字段组成的较小文件。在某些时候,有人将此文件与其他文件连接起来,然后进行排序以形成上述大文件。我想从大文件中减去小文件的行(即对于小文件中的每一行,如果大文件中存在匹配的字符串,则删除大文件中的该行)。
该文件大致如下
GenericClass1; 1; 2; NA; 3; 4;
GenericClass1; 5; 6; NA; 7; 8;
GenericClass2; 1; 5; NA; 3; 8;
GenericClass2; 2; 6; NA; 4; 1;
Run Code Online (Sandbox Code Playgroud)
等等
有没有一种快速优雅的方法来做到这一点,还是我必须使用 awk?
ter*_*don 34
您可以使用grep. 将小文件作为输入并告诉它查找不匹配的行:
grep -vxFf file.txt bigfile.txt > newbigfile.txt
Run Code Online (Sandbox Code Playgroud)
使用的选项是:
-F, --fixed-strings
Interpret PATTERN as a list of fixed strings, separated by
newlines, any of which is to be matched. (-F is specified by
POSIX.)
-f FILE, --file=FILE
Obtain patterns from FILE, one per line. The empty file
contains zero patterns, and therefore matches nothing. (-f is
specified by POSIX.)
-v, --invert-match
Invert the sense of matching, to select non-matching lines. (-v
is specified by POSIX.)
-x, --line-regexp
Select only those matches that exactly match the whole line.
(-x is specified by POSIX.)
Run Code Online (Sandbox Code Playgroud)
Ulr*_*arz 20
comm 是你的朋友:
NAME comm - 逐行比较两个排序的文件
概要通信 [选项]... FILE1 FILE2
说明 逐行比较已排序的文件 FILE1 和 FILE2。
Run Code Online (Sandbox Code Playgroud)With no options, produce three-column output. Column one contains lines unique to FILE1, column two contains lines unique to FILE2, and column three contains lines common to both files. -1 suppress column 1 (lines unique to FILE1) -2 suppress column 2 (lines unique to FILE2) -3 suppress column 3 (lines that appear in both files)
(因为它考虑了排序性comm,grep所以可能会带来性能优势。)
例如:
comm -1 -3 file.txt bigfile.txt > newbigfile.txt
Run Code Online (Sandbox Code Playgroud)