我有一个制表符分隔文件...
123 1:2334523 yes
127 1:332443 yes
113 1:332443 no
115 1:55434 no
115 1:55434 no
115 1:55434 yes
Run Code Online (Sandbox Code Playgroud)
我想计算第2列中的值出现在第2列中的次数,然后将其打印到行的末尾,如...
123 1:2334523 yes 1
127 1:332443 yes 2
113 1:332443 no 2
115 1:55434 no 3
115 1:55434 no 3
115 1:55434 yes 3
Run Code Online (Sandbox Code Playgroud)
所以在第2栏1:332443出现两次,1:55434出现3次.
我认为这应该在Awk或sed中相对容易,但还没有设法弄明白.
你可以这样做:
awk 'NR == FNR { ++ctr[$2]; next } { print $0 "\t" ctr[$2]; }' filename filename
Run Code Online (Sandbox Code Playgroud)
因为我们需要在打印之前知道计数器,所以我们需要对文件进行两次传递,这就是为什么filename要提到两次.然后awk代码是:
NR == FNR { # if the record number is the same as the record number in the
# current file (that is: in the first pass)
++ctr[$2] # count how often field 2 showed up
next # don't do anything else for the first pass
}
{ # then in the second pass:
print $0 "\t" ctr[$2]; # print the line, a tab, and the counter.
}
Run Code Online (Sandbox Code Playgroud)