计算按行分组的列的累计总和和百分比

Question

计算按行分组的列的累计总和和百分比

我有一个非常大的值表，其格式如下：

apple   1   1 
apple   2   1
apple   3   1
apple   4   1
banana  25  4
banana  35  10
banana  36  10
banana  37  10

Run Code Online (Sandbox Code Playgroud)

第 1 列有许多不同的水果，每个水果的行数各不相同。

我想计算第 1 列中每种水果的第 3 列的累计总和，以及每行总计的累计百分比，并将它们添加为新列。所以期望的输出是这样的：

apple   1   1   1   25.00 
apple   2   1   2   50.00
apple   3   1   3   75.00
apple   4   1   4   100.00
banana  25  4   4   11.76   
banana  35  10  14  41.18
banana  36  10  24  70.59
banana  37  10  34  100.00

Run Code Online (Sandbox Code Playgroud)

我可以使用 awk 实现部分目标，但我正在努力解决如何在每个新水果上重置累积总和。这是我为您的观看乐趣而进行的可怕的 awk 尝试：

#!/bin/bash

awk '{cumsum += $3; $3 = cumsum} 1' fruitfile > cumsum.tmp
total=$(awk '{total=total+$3}END{print total}' fruitfile)
awk -v total=$total '{ printf ("%s\t%s\t%s\t%.5f\n", $1, $2, $3, ($3/total)*100)}' cumsum.tmp > cumsum.txt
rm cumsum.tmp

Run Code Online (Sandbox Code Playgroud)

Answer 1

Rav*_*h13 5

您能否尝试使用所示示例进行以下，编写和测试。

awk '
FNR==NR{
  a[$1]+=$NF
  next
}
{
  sum[$1]+=($NF/a[$1])*100
  print $0,++b[$1],sum[$1]
}
' Input_file Input_file | 
column -t

Run Code Online (Sandbox Code Playgroud)

所示样本的输出如下。

apple   1   1   1  25
apple   2   1   2  50
apple   3   1   3  75
apple   4   1   4  100
banana  25  4   1  11.7647
banana  35  10  2  41.1765
banana  36  10  3  70.5882
banana  37  10  4  100

Run Code Online (Sandbox Code Playgroud)

说明：为以上添加详细说明。

awk '                           ##Starting awk program from here.
FNR==NR{                        ##Checking condition FNR==NR which will be TRUE when first time Input_file is being read.
  a[$1]+=$NF                    ##Creating array a with index $1 and keep adding its last field value to it.
  next                          ##next will skip all further statements from here.
}
{
  sum[$1]+=($NF/a[$1])*100      ##Creating sum with index 1st field and keep adding its value to it, each value will have last field/value of a[$1] and multiplying it with 100.
  print $0,++b[$1],sum[$1]      ##Printing current line, array b with 1st field with increasing value of 1 and sum with index of 1st field.
}
' Input_file Input_file |       ##Mentioning Input_file name here.
column -t                       ##Sending awk output to column command for better look.

Run Code Online (Sandbox Code Playgroud)

归档时间：	5 年，4 月前
查看次数：	99 次
最近记录：	5 年，4 月前