如何计算每列的百分比?

jpa*_*mer 5 awk

我正在尝试将一些数据转换为每列总数的百分比,与此线程非常相似,只是我需要为每一列执行此操作: Calculate and Divide by total with AWK

数据会像这样(但更多的列和行):

ID     Sample1     Sample2      Sample3
One      10          0            5
Two      3           6            8
Three    3           4            7
Run Code Online (Sandbox Code Playgroud)

所需的输出如下所示:

ID     Sample1     Sample2     Sample3
One     62.50        0.0        25.0
Two     18.75       60.0        40.0
Three   18.75       40.0        35.0   
Run Code Online (Sandbox Code Playgroud)

以下适用于单个列,但我想对每一列都执行此操作,除了第一列。

gawk -F"\t" '{a[NR]=$1;x+=(b[NR]=$2)}END{while(++i<=NR)print a[i]"\t"100*b[i]/x}' file.txt 
Run Code Online (Sandbox Code Playgroud)

非常感谢您提供的任何帮助。

jan*_*nos 4

输出与您要求的不 100% 相同,但希望足够接近:

function percent(value, total) {
    return sprintf("%.2f", 100 * value / total);
}
{
    label[NR] = $1
    for (i = 2; i <= NF; ++i) {
        sum[i] += col[i][NR] = $i
    }
}
END {
    title = label[1]
    for (i = 2; i <= length(col) + 1; ++i) {
        title = title "\t" col[i][1];
    }
    print title
    for (j = 2; j <= NR; ++j) {
        line = label[j]
        for (i = 2; i <= length(col) + 1; ++i) {
            line = line "\t" percent(col[i][j], sum[i]);
        }
        print line
    }
}
Run Code Online (Sandbox Code Playgroud)

产生输出:

ID    Sample1 Sample2 Sample3
One   62.50   0.00    25.00
Two   18.75   60.00   40.00
Three 18.75   40.00   35.00
Run Code Online (Sandbox Code Playgroud)

运行它gawk -f script.awk file.txt

当然,您可以将脚本压缩为一行,但我认为最好将其保存在这样的脚本文件中,这样更容易阅读和维护。

一个更简单、更好的版本,也可以与 BSD AWK 一起使用,而不仅仅是 GNU AWK:

function percent(value, total) {
    return sprintf("%.2f", 100 * value / total)
}
BEGIN { OFS = "\t" }
NR == 1 { gsub(/ +/, OFS); print; next }
{
    label[NR] = $1
    for (i = 2; i <= NF; ++i) {
        sum[i] += col[i, NR] = $i
    }
}
END {
    for (j = 2; j <= NR; ++j) {
        $1 = label[j]
        for (i = 2; i <= NF; ++i) {
            $i = percent(col[i, j], sum[i])
        }
        print
    }
}
Run Code Online (Sandbox Code Playgroud)