我有一个像这样分隔的数据集选项卡:
A B C D
1 aaa 1 2
1 aaa 3 4
1 aaa 5 6
1 bbb 7 8
1 ccc 9 1
1 ccc 2 3
1 ddd 4 5
1 ddd 6 7
1 ddd 8 9
1 ddd 1 2
Run Code Online (Sandbox Code Playgroud)
期望的输出:
A B C D
1 aaa 1 2
1 aaa 3 4
1 aaa 5 6
1 ddd 4 5
1 ddd 6 7
1 ddd 8 9
1 ddd 1 2
Run Code Online (Sandbox Code Playgroud)
我试过这个:
awk '++a[$2]>3' test.tsv test.tsv > test-2.tsv
Run Code Online (Sandbox Code Playgroud)
不需要的输出:
1 ddd 1 2
1 aaa 1 2
1 aaa 3 4
1 aaa 5 6
1 ccc 2 3
1 ddd 4 5
1 ddd 6 7
1 ddd 8 9
1 ddd 1 2
Run Code Online (Sandbox Code Playgroud)
你可以试试这个 2 pass awk:
awk -F '\t' 'FNR==NR {freq[$2]++; next} freq[$2] >= 3' test.tsv{,}
1 aaa 1 2
1 aaa 3 4
1 aaa 5 6
1 ddd 4 5
1 ddd 6 7
1 ddd 8 9
1 ddd 1 2
Run Code Online (Sandbox Code Playgroud)