ser*_*gio 4 command-line text-processing
我有两个看起来像这样的文件:
文件 1(唯一 ID):
C84610112
C96209347
C84774620
C84774691
C85594749
C89372772
C89651687
C89845500
C89914896
C91269765
C91526663
C92210411
C92254517
C93709504
C94303303
C95100561
C95100609
C95417520
C95696352
C96045246
C96045496
C96060727
C96076986
Run Code Online (Sandbox Code Playgroud)
和文件2:
1 C95696352 score: -69.785 nathvy = 38 nconfs = 888
2 C98230482 score: -57.431 nathvy = 47 nconfs = 575
3 C96209347 score: -57.128 nathvy = 24 nconfs = 1188
4 C36510773 score: -56.502 nathvy = 38 nconfs = 7595
5 C04355288 score: -56.400 nathvy = 41 nconfs = 50502
6 C89372772 score: -55.728 nathvy = 22 nconfs = 3228
7 C96209347 score: -54.713 nathvy = 24 nconfs = 162
8 C96209347 score: -53.901 nathvy = 24 nconfs = 159
9 C06169346 score: -53.438 nathvy = 22 nconfs = 105
10 C95696352 score: -52.848 nathvy = 38 nconfs = 878
11 C98216318 score: -52.061 nathvy = 52 nconfs = 1092
12 C04285713 score: -52.009 nathvy = 38 nconfs = 1355
13 C96209347 score: -51.477 nathvy = 24 nconfs = 1375
14 C98222837 score: -50.730 nathvy = 34 nconfs = 588
15 C98216318 score: -50.694 nathvy = 52 nconfs = 1136
16 C32832068 score: -50.546 nathvy = 22 nconfs = 548
17 C95696352 score: -50.475 nathvy = 38 nconfs = 3220
18 C32832068 score: -50.457 nathvy = 22 nconfs = 16235
19 C95696352 score: -50.234 nathvy = 38 nconfs = 3048
20 C85594749 score: -49.780 nathvy = 44 nconfs = 4536
21 C72332782 score: -49.676 nathvy = 41 nconfs = 3942
22 C97970648 score: -49.616 nathvy = 45 nconfs = 17640
23 C04285713 score: -49.594 nathvy = 38 nconfs = 14038
24 C98043133 score: -49.370 nathvy = 43 nconfs = 1236
25 C89372772 score: -49.308 nathvy = 22 nconfs = 471
26 C97970648 score: -49.297 nathvy = 45 nconfs = 17850
27 C85594749 score: -49.122 nathvy = 44 nconfs = 4158
28 C70006381 score: -49.092 nathvy = 24 nconfs = 880
Run Code Online (Sandbox Code Playgroud)
我想将 IDfile1与file2(第二列)中的 ID 以及匹配的 ID匹配以打印它们。此外,在file2某些 ID 中是重复的,例如C96209347(尽管整行不相同)。我想 grep 那些第一次出现的行,而其他人则跳过。所以在这个特定的例子中,应该只打印C96209347第三行 from file2。有人可以帮忙吗?
尝试这个,
grep -f file1 file2 | awk '!_[$2]++'
1 C95696352 score: -69.785 nathvy = 38 nconfs = 888
3 C96209347 score: -57.128 nathvy = 24 nconfs = 1188
6 C89372772 score: -55.728 nathvy = 22 nconfs = 3228
20 C85594749 score: -49.780 nathvy = 44 nconfs = 4536
Run Code Online (Sandbox Code Playgroud)
解释
grep -f file1 file2: 在 file2 中搜索从 file1 获得的模式的匹配项awk '!_[$2]++':如果$2之前已经看到过字段,则不要打印任何内容(通过)
_ 是数组名称(可以是任何东西,例如“seen”)_[$2]++将创建一个数组条目,其键是字段的内容$2并添加 1_[$2]是没有(!)已设置,打印线。该print命令是 awk 在条件匹配时执行的默认操作。| 归档时间: |
|
| 查看次数: |
229 次 |
| 最近记录: |