根据另一个字段中的变量获取字段中值的平均值

Question

有没有办法根据另一个字段中的变量获得一个字段中值的平均值？例如对于以下输入

a x 3
b y 4
a y 2
b x 5
b x 20

我想要这个输出

a 2.5
b 9.67

我发现这个 awk 脚本可以获取列中值的平均值

awk '{ total += $3; count++ } END { print total/count }' file.txt

但是如何在其中添加 for 循环以获得第 1 列中每个变量的平均值？

该文件以制表符分隔。

谢谢

Answer 1

你离得不远了。尝试由$1以下索引的数组：

awk '{ total[$1] += $3; count[$1]++ } END {for (t in total) print t, total[t]/count[t]}' file
a 2.5
b 9.66667

或者，如果您最多需要两个小数点，如您的问题所示：

$ awk '{ total[$1] += $3; count[$1]++ } END {for (t in total) printf "%s %.2f\n", t, total[t]/count[t]}' file
a 2.50
b 9.67

Answer 2

米勒对于像这样的任务也很得心应手。

$ mlr --nidx stats1 -a mean -f 3 -g 1 file.txt
a 2.500000
b 9.666667

或（带有动词的更新版本format-values）

$ mlr --nidx stats1 -a mean -f 3 -g 1 then format-values -f '%.2f' file.txt
a 2.50
b 9.67