重新排列csv文件

bsm*_*moo 1 csv bash shell scripting awk

我有一个内容类似于下面的文件

Boy,Football
Boy,Football
Boy,Football
Boy,Squash
Boy,Tennis
Boy,Football
Girl,Tennis
Girl,Squash
Girl,Tennis
Girl,Tennis
Boy,Football
Run Code Online (Sandbox Code Playgroud)

如何使用'awk'或类似方法将其重新排列为以下内容:

     Football Tennis Squash
Boy  5        1      1
Girl 0        3      1
Run Code Online (Sandbox Code Playgroud)

我甚至不确定这是否可行,但任何帮助都会很棒.

Ed *_*ton 5

$ cat tst.awk
BEGIN{ FS=","; OFS="\t" }
{
    genders[$1]
    sports[$2]
    count[$1,$2]++
}
END {
    printf ""
    for (sport in sports) {
        printf "%s%s", OFS, sport
    }
    print ""
    for (gender in genders) {
        printf "%s", gender
        for (sport in sports) {
            printf "%s%s", OFS, count[gender,sport]+0
        }
        print ""
    }
}

$ awk -f tst.awk file
        Squash  Tennis  Football
Boy     1       1       5
Girl    1       3       0
Run Code Online (Sandbox Code Playgroud)

通常,当您知道循环的终点时,您会在每个字段后面放置OFS或ORS:

for (i=1; i<=n; i++) {
    printf "%s%s", $i, (i<n?OFS:ORS)
}
Run Code Online (Sandbox Code Playgroud)

但如果你不这样做,那么你将OFS放在第二个和后续的字段之前,并在循环后打印ORS:

for (x in array) {
    printf "%s%s", (++i>1?OFS:""), array[x]
}
print ""
Run Code Online (Sandbox Code Playgroud)

我喜欢:

n = length(array)
for (x in array) {
    printf "%s%s", array[x], (++i<n?OFS:ORS)
}
Run Code Online (Sandbox Code Playgroud)

想要获得循环的结束,但是length(array)特定于gawk.

另一种考虑方法:

$ cat tst.awk
BEGIN{ FS=","; OFS="\t" }
{
    for (i=1; i<=NF; i++) {
        if (!seen[i,$i]++) {
            map[i,++num[i]] = $i
        }
    }
    count[$1,$2]++
}
END {
    for (i=0; i<=num[2]; i++) {
        printf "%s%s", map[2,i], (i<num[2]?OFS:ORS)
    }
    for (i=1; i<=num[1]; i++) {
        printf "%s%s", map[1,i], OFS
        for (j=1; j<=num[2]; j++) {
            printf "%s%s", count[map[1,i],map[2,j]]+0, (j<num[2]?OFS:ORS)
        }
    }
}

$ awk -f tst.awk file
        Football        Squash  Tennis
Boy     5       1       1
Girl    0       1       3
Run Code Online (Sandbox Code Playgroud)

最后一个将按照读取的顺序打印行和列.虽然它的工作原理并不那么明显:-).

  • 这很漂亮!!!收藏这个问题所以我可以经常回答这个问题! (2认同)