Bash - 如何打印列末尾的值出现的次数

Tom*_*ith 0 bash awk sed

我有一个制表符分隔文件...

123 1:2334523   yes
127 1:332443    yes
113 1:332443    no
115 1:55434     no
115 1:55434     no
115 1:55434     yes
Run Code Online (Sandbox Code Playgroud)

我想计算第2列中的值出现在第2列中的次数,然后将其打印到行的末尾,如...

123 1:2334523   yes 1
127 1:332443    yes 2
113 1:332443    no  2
115 1:55434     no  3
115 1:55434     no  3   
115 1:55434     yes 3
Run Code Online (Sandbox Code Playgroud)

所以在第2栏1:332443出现两次,1:55434出现3次.

我认为这应该在Awk或sed中相对容易,但还没有设法弄明白.

Win*_*ute 5

你可以这样做:

awk 'NR == FNR { ++ctr[$2]; next } { print $0 "\t" ctr[$2]; }' filename filename
Run Code Online (Sandbox Code Playgroud)

因为我们需要在打印之前知道计数器,所以我们需要对文件进行两次传递,这就是为什么filename要提到两次.然后awk代码是:

NR == FNR {    # if the record number is the same as the record number in the
               # current file (that is: in the first pass)
  ++ctr[$2]    # count how often field 2 showed up
  next         # don't do anything else for the first pass
}
{              # then in the second pass:
  print $0 "\t" ctr[$2];   # print the line, a tab, and the counter.
}
Run Code Online (Sandbox Code Playgroud)