我有一个非常大的值表,其格式如下:
apple 1 1
apple 2 1
apple 3 1
apple 4 1
banana 25 4
banana 35 10
banana 36 10
banana 37 10
Run Code Online (Sandbox Code Playgroud)
第 1 列有许多不同的水果,每个水果的行数各不相同。
我想计算第 1 列中每种水果的第 3 列的累计总和,以及每行总计的累计百分比,并将它们添加为新列。所以期望的输出是这样的:
apple 1 1 1 25.00
apple 2 1 2 50.00
apple 3 1 3 75.00
apple 4 1 4 100.00
banana 25 4 4 11.76
banana 35 10 14 41.18
banana 36 10 24 70.59
banana 37 10 34 100.00
Run Code Online (Sandbox Code Playgroud)
我可以使用 awk 实现部分目标,但我正在努力解决如何在每个新水果上重置累积总和。这是我为您的观看乐趣而进行的可怕的 awk 尝试:
#!/bin/bash
awk '{cumsum += $3; $3 = cumsum} 1' fruitfile > cumsum.tmp
total=$(awk '{total=total+$3}END{print total}' fruitfile)
awk -v total=$total '{ printf ("%s\t%s\t%s\t%.5f\n", $1, $2, $3, ($3/total)*100)}' cumsum.tmp > cumsum.txt
rm cumsum.tmp
Run Code Online (Sandbox Code Playgroud)
您能否尝试使用所示示例进行以下,编写和测试。
awk '
FNR==NR{
a[$1]+=$NF
next
}
{
sum[$1]+=($NF/a[$1])*100
print $0,++b[$1],sum[$1]
}
' Input_file Input_file |
column -t
Run Code Online (Sandbox Code Playgroud)
所示样本的输出如下。
apple 1 1 1 25
apple 2 1 2 50
apple 3 1 3 75
apple 4 1 4 100
banana 25 4 1 11.7647
banana 35 10 2 41.1765
banana 36 10 3 70.5882
banana 37 10 4 100
Run Code Online (Sandbox Code Playgroud)
说明:为以上添加详细说明。
awk ' ##Starting awk program from here.
FNR==NR{ ##Checking condition FNR==NR which will be TRUE when first time Input_file is being read.
a[$1]+=$NF ##Creating array a with index $1 and keep adding its last field value to it.
next ##next will skip all further statements from here.
}
{
sum[$1]+=($NF/a[$1])*100 ##Creating sum with index 1st field and keep adding its value to it, each value will have last field/value of a[$1] and multiplying it with 100.
print $0,++b[$1],sum[$1] ##Printing current line, array b with 1st field with increasing value of 1 and sum with index of 1st field.
}
' Input_file Input_file | ##Mentioning Input_file name here.
column -t ##Sending awk output to column command for better look.
Run Code Online (Sandbox Code Playgroud)