Ali*_*423 5 bash pivot pivot-table
我的输入制表符分隔文件如下所示:
13435 830169 830264 a 95 y 16
09433 835620 835672 x 46
30945 838405 838620 a 21 c 19
94853 850475 850660 y 15
04958 865700 865978 c 16 a 98
Run Code Online (Sandbox Code Playgroud)
在前三列之后,文件在下一列中显示变量及其值。我需要更改数据结构,以便在前三列之后,有这样的变量列:
a x y c
13435 830169 830264 95 16
09433 835620 835672 46
30945 838405 838620 21 19
94853 850475 850660 15
04958 865700 865978 98 16
Run Code Online (Sandbox Code Playgroud)
在linux下有没有代码可以做到这一点?文件大小为 7.6 MB,总行数约为 450,000 行。变量总数为四个。
谢谢
假设:
a/c/x/y在示例输入中)事先未知awk数组);这允许输入文件的单次传递;如果内存是一个问题(即输入文件无法放入内存),则需要不同的编码/设计(本答案中未解决)另一个awk想法......需要GNU awk使用多维数组以及PROCINFO["sorted_in"]构造:
awk '
BEGIN { FS=OFS="\t" } # input/output field delimiters = <tab>
{ first3[FNR]=$1 OFS $2 OFS $3 # store first 3 fields
for (i=4;i<=NF;i=i+2) { # loop through rest of fields, 2 at a time
vars[$i] # keep track of variable names
values[FNR][$i]=$(i+1) # store the value for this line/variable combo
}
}
END { PROCINFO["sorted_in"]="@ind_str_asc" # sort vars[] indexes in ascending order
printf "%s%s", OFS, OFS # start printing header line ...
for (v in vars) # loop through variable names ...
printf "%s%s", OFS, v # printing to header line
printf "\n" # terminate header line
for (i=1;i<=FNR;i++) { # loop through our set of lines ...
printf "%s",first3[i] # print the 1st 3 fields and then ...
for (v in vars) # loop through list of all variables ...
printf "%s%s",OFS,values[i][v] # printing the associated value; non-existent values default to the empty string ""
printf "\n" # terminate the current line of output
}
}
' inputfile
Run Code Online (Sandbox Code Playgroud)
注:此设计允许处理可变数量的变量。
出于演示目的,我们将使用以下制表符分隔的输入文件:
$ cat input4 # OP's sample input file w/ 4 variables
13435 830169 830264 a 95 y 16
09433 835620 835672 x 46
30945 838405 838620 a 21 c 19
94853 850475 850660 y 15
04958 865700 865978 c 16 a 98
$ cat input6 # 2 additional variables added to OP's original input file
13435 830169 830264 a 95 y 16
09433 835620 835672 x 46 t 375
30945 838405 838620 a 21 c 19
94853 850475 850660 y 15 j 127 t 453
04958 865700 865978 c 16 a 98
Run Code Online (Sandbox Code Playgroud)
通过脚本运行这些会awk生成:
############# input4
a c x y
13435 830169 830264 95 16
09433 835620 835672 46
30945 838405 838620 21 19
94853 850475 850660 15
04958 865700 865978 98 16
############# input6
a c j t x y
13435 830169 830264 95 16
09433 835620 835672 375 46
30945 838405 838620 21 19
94853 850475 850660 127 453 15
04958 865700 865978 98 16
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
285 次 |
| 最近记录: |