我在awk脚本中遇到一个问题- 我需要生成一份报告,其中包含数据文件中每个作业的最低、最高和平均分数。作业名称位于column 3.
输入数据为:
Student,Catehory,Assignment,Score,Possible
Chelsey,Homework,H01,90,100
Chelsey,Homework,H02,89,100
Chelsey,Homework,H03,77,100
Chelsey,Homework,H04,80,100
Chelsey,Homework,H05,82,100
Chelsey,Homework,H06,84,100
Chelsey,Homework,H07,86,100
Chelsey,Lab,L01,91,100
Chelsey,Lab,L02,100,100
Chelsey,Lab,L03,100,100
Chelsey,Lab,L04,100,100
Chelsey,Lab,L05,96,100
Chelsey,Lab,L06,80,100
Chelsey,Lab,L07,81,100
Chelsey,Quiz,Q01,100,100
Chelsey,Quiz,Q02,100,100
Chelsey,Quiz,Q03,98,100
Chelsey,Quiz,Q04,93,100
Chelsey,Quiz,Q05,99,100
Chelsey,Quiz,Q06,88,100
Chelsey,Quiz,Q07,100,100
Chelsey,Final,FINAL,82,100
Chelsey,Survey,WS,5,5
Sam,Homework,H01,19,100
Sam,Homework,H02,82,100
Sam,Homework,H03,95,100
Sam,Homework,H04,46,100
Sam,Homework,H05,82,100
Sam,Homework,H06,97,100
Sam,Homework,H07,52,100
Sam,Lab,L01,41,100
Sam,Lab,L02,85,100
Sam,Lab,L03,99,100
Sam,Lab,L04,99,100
Sam,Lab,L05,0,100
Sam,Lab,L06,0,100
Sam,Lab,L07,0,100
Sam,Quiz,Q01,91,100
Sam,Quiz,Q02,85,100
Sam,Quiz,Q03,33,100
Sam,Quiz,Q04,64,100
Sam,Quiz,Q05,54,100
Sam,Quiz,Q06,95,100
Sam,Quiz,Q07,68,100
Sam,Final,FINAL,58,100
Sam,Survey,WS,5,5
Andrew,Homework,H01,25,100
Andrew,Homework,H02,47,100
Andrew,Homework,H03,85,100
Andrew,Homework,H04,65,100
Andrew,Homework,H05,54,100
Andrew,Homework,H06,58,100
Andrew,Homework,H07,52,100
Andrew,Lab,L01,87,100
Andrew,Lab,L02,45,100
Andrew,Lab,L03,92,100
Andrew,Lab,L04,48,100
Andrew,Lab,L05,42,100
Andrew,Lab,L06,99,100
Andrew,Lab,L07,86,100
Andrew,Quiz,Q01,25,100
Andrew,Quiz,Q02,84,100
Andrew,Quiz,Q03,59,100
Andrew,Quiz,Q04,93,100
Andrew,Quiz,Q05,85,100
Andrew,Quiz,Q06,94,100
Andrew,Quiz,Q07,58,100
Andrew,Final,FINAL,99,100
Andrew,Survey,WS,5,5
Ava,Homework,H01,55,100
Ava,Homework,H02,95,100
Ava,Homework,H03,84,100
Ava,Homework,H04,74,100
Ava,Homework,H05,95,100
Ava,Homework,H06,84,100
Ava,Homework,H07,55,100
Ava,Lab,L01,66,100
Ava,Lab,L02,77,100
Ava,Lab,L03,88,100
Ava,Lab,L04,99,100
Ava,Lab,L05,55,100
Ava,Lab,L06,66,100
Ava,Lab,L07,77,100
Ava,Quiz,Q01,88,100
Ava,Quiz,Q02,99,100
Ava,Quiz,Q03,44,100
Ava,Quiz,Q04,55,100
Ava,Quiz,Q05,66,100
Ava,Quiz,Q06,77,100
Ava,Quiz,Q07,88,100
Ava,Final,FINAL,99,100
Ava,Survey,WS,5,5
Shane,Homework,H01,50,100
Shane,Homework,H02,60,100
Shane,Homework,H03,70,100
Shane,Homework,H04,60,100
Shane,Homework,H05,70,100
Shane,Homework,H06,80,100
Shane,Homework,H07,90,100
Shane,Lab,L01,90,100
Shane,Lab,L02,0,100
Shane,Lab,L03,100,100
Shane,Lab,L04,50,100
Shane,Lab,L05,40,100
Shane,Lab,L06,60,100
Shane,Lab,L07,80,100
Shane,Quiz,Q01,70,100
Shane,Quiz,Q02,90,100
Shane,Quiz,Q03,100,100
Shane,Quiz,Q04,100,100
Shane,Quiz,Q05,80,100
Shane,Quiz,Q06,80,100
Shane,Quiz,Q07,80,100
Shane,Final,FINAL,90,100
Shane,Survey,WS,5,5
Run Code Online (Sandbox Code Playgroud)
awk 脚本:
BEGIN {
FS=" *\\, *"
}
FNR>1 {
min[$3]=(!($3 in min) || min[$3]> $4 )? $4 : min[$3]
max[$3]=(max[$3]> $4)? max[$3] : $4
cnt[$3]++
sum[$3]+=$4
}
END {
print "Name\tLow\tHigh\tAverage"
for (i in cnt)
printf("%s\t%d\t%d\t%.1f\n", i, min[i], max[i], sum[i]/cnt[i])
}
Run Code Online (Sandbox Code Playgroud)
预期样本输出:
Name Low High Average
Q06 77 95 86.80
L05 40 96 46.60
WS 5 5 5
Q07 58 100 78.80
L06 60 99 61
L07 77 86 64.80
Run Code Online (Sandbox Code Playgroud)
当我运行脚本时,对于所有不正确的分配,我得到的“低”值为 0。我哪里错了?请指导。
您当然可以使用 awk 来完成此操作,但由于您也标记了此脚本,因此我假设其他工具也是一种选择。对于这种数据中存在的组的统计数据收集,GNU datamash通常将工作简化为简单的一行。例如:
$ (echo Name,Low,High,Average; datamash --header-in -s -t, -g3 min 4 max 4 mean 4 < input.csv) | tr , '\t'
Name Low High Average
FINAL 58 99 85.6
H01 19 90 47.8
H02 47 95 74.6
H03 70 95 82.2
H04 46 80 65
H05 54 95 76.6
H06 58 97 80.6
H07 52 90 67
L01 41 91 75
L02 0 100 61.4
L03 88 100 95.8
L04 48 100 79.2
L05 0 96 46.6
L06 0 99 61
L07 0 86 64.8
Q01 25 100 74.8
Q02 84 100 91.6
Q03 33 100 66.8
Q04 55 100 81
Q05 54 99 76.8
Q06 77 95 86.8
Q07 58 100 78.8
WS 5 5 5
Run Code Online (Sandbox Code Playgroud)
这表示,对于带有标题 ( ) 的简单 CSV 输入 ( )的第三列具有相同值的每个组( -g3,加上-s对输入进行排序(工具的要求)),显示以下值的最小值、最大值和平均值第四栏。所有这些都被赋予了一个新的标头,并通过管道将逗号转换为制表符。-t,--header-intr
| 归档时间: |
|
| 查看次数: |
125 次 |
| 最近记录: |