在列内添加逗号分隔值

Zen*_*Mac 8 python awk

你好,我有一个像这样的文件格式(TSV)

Name  type    Age     Weight       Height 
Xxx   M    12,34,23  50,30,60,70   4,5,6,5.5 
Yxx   F    21,14,32  40,50,20,40   3,4,5,5.5
Run Code Online (Sandbox Code Playgroud)

我想添加年龄、体重和身高中的所有值,并在其后添加一列,然后添加一些百分比,例如 Total_Height/Total_Weight (awk '$0=$0"\t"(NR==1?"Percentage" :$8/$7)')。我的数据集很大,用excel做不到。

像这样

Name  type    Age     Weight       Height     Total_Age Total_Weight Total_Height Percentage
Xxx   M    12,34,23  50,30,60,70   4,5,6,5.5   69        210         20.5          0.097            
Yxx   F    21,14,32  40,50,20,40   3,4,5,5.5   67        150         17.5          0.11 
Run Code Online (Sandbox Code Playgroud)

Rav*_*h13 7

使用您显示的示例,请尝试以下代码。

awk '
FNR==1{
  print $0,"Total_Age Total_Weight Total_Height Percentage"
  next
}
FNR>1{
  totAge=totWeight=totHeight=0
  split($3,tmp,",")
  for(i in tmp){
    totAge+=tmp[i]
  }
  split($4,tmp,",")
  for(i in tmp){
    totWeight+=tmp[i]
  }
  split($5,tmp,",")
  for(i in tmp){
    totHeight+=tmp[i]
  }
  $(NF+1)=totAge
  $(NF+1)=totWeight
  $(NF+1)=totHeight
  $(NF+1)=$(NF-1)==0?"N/A":$NF/$(NF-1)
}
1' Input_file | column -t
Run Code Online (Sandbox Code Playgroud)

或者添加上述代码的简短版本awk

awk '
BEGIN{OFS="\t"}
FNR==1{
  print $0,"Total_Age Total_Weight Total_Height Percentage"
  next
}
FNR>1{
  totAge=totWeight=totHeight=0
  split($3,tmp,",")
  for(i in tmp){
    totAge+=tmp[i]
  }
  split($4,tmp,",")
  for(i in tmp){
    totWeight+=tmp[i]
  }
  split($5,tmp,",")
  for(i in tmp){
    totHeight+=tmp[i]
  }
  $(NF+1)=totAge OFS totWeight OFS totHeight
  $0=$0
  $(NF+1)=( $(NF-1)==0 ? "N/A" : $NF/$(NF-1) )
}
1' Input_file | column -t
Run Code Online (Sandbox Code Playgroud)

解释:简单的解释是,取第三、第四和第五列的总和并将它们分配给该行的最后一列。因此,根据OP的要求添加具有最后一列和倒数第二列除法值的列值。用于column -t使其输出看起来更好。


Ed *_*ton 5

在每个 Unix 机器上的任何 shell 中使用任何 awk,并且不会在每个记录中创建新字段(这会导致每次更改字段时 awk 重新构建记录)并且不会更新输入记录(这会导致效率低下)每次更改记录时,它都会导致 awk 将记录重新拆分为字段),并设计用于以任何顺序处理任意数量的值输入列:

$ cat tst.awk
BEGIN { FS=OFS="\t" }
{ printf "%s%s", $0, OFS }
NR==1 {
    for (i=3; i<=NF; i++) {
        printf "Total_%s%s", $i, OFS
        tags[i] = $i
    }
    print "Percentage"
    next
}
{
    delete tot
    for (i=3; i<=NF; i++) {
        tag = tags[i]
        n = split($i,vals,",")
        for (j in vals) {
            tot[tag] += vals[j]
        }
        printf "%s%s", tot[tag], OFS
    }
    printf "%0.3f%s", (tot["Weight"] ? tot["Height"] / tot["Weight"] : 0), ORS
}
Run Code Online (Sandbox Code Playgroud)

$ awk -f tst.awk file
Name    type    Age     Weight  Height  Total_Age       Total_Weight    Total_Height    Percentage
Xxx     M       12,34,23        50,30,60,70     4,5,6,5.5       69      210     20.5    0.098
Yxx     F       21,14,32        40,50,20,40     3,4,5,5.5       67      150     17.5    0.117
Run Code Online (Sandbox Code Playgroud)

$ awk -f tst.awk file | column -t
Name  type  Age       Weight       Height     Total_Age  Total_Weight  Total_Height  Percentage
Xxx   M     12,34,23  50,30,60,70  4,5,6,5.5  69         210           20.5          0.098
Yxx   F     21,14,32  40,50,20,40  3,4,5,5.5  67         150           17.5          0.117
Run Code Online (Sandbox Code Playgroud)

为了显示上述方法的功能优势,假设您需要添加更多值,例如ShoeSize和/或重新排列列的顺序,例如:

$ column -t file
Name  type  ShoeSize  Height     Age       Weight
Xxx   M     12,8,10   4,5,6,5.5  12,34,23  50,30,60,70
Yxx   F     9,7,8     3,4,5,5.5  21,14,32  40,50,20,40
Run Code Online (Sandbox Code Playgroud)

现在运行上面的脚本,注意您为每个原始列添加了列,并且您仍然在末尾添加了Total_相同的高度/体重列:Percentage

$ awk -f tst.awk file | column -t
Name  type  ShoeSize  Height     Age       Weight       Total_ShoeSize  Total_Height  Total_Age  Total_Weight  Percentage
Xxx   M     12,8,10   4,5,6,5.5  12,34,23  50,30,60,70  30              20.5          69         210           0.098
Yxx   F     9,7,8     3,4,5,5.5  21,14,32  40,50,20,40  24              17.5          67         150           0.117
Run Code Online (Sandbox Code Playgroud)