awk计算多个文本文件中的字段平均值并合并为一个

jus*_*guy 2 bash awk

我试图计算$2目录中多个测试文件的平均值,并将输出合并到一个tab-delimeted输出文件中.输出文件是两个字段,其中$1包含已提取的文件名pref,以及$2" is the calculated average with one decimal, rounded up. There is also a header in the outputSample in$ 1 andPercent in$ 2`.下面似乎很接近,但我缺少一些东西(将标题添加到输出,合并到一个制表符分隔文件,并舍入到3个小数位),我不知道该怎么做而没有得到所需的输出.谢谢 :).

123_base.txt

AASS     99.81
ABAT     100.00
ABCA10   0.0
Run Code Online (Sandbox Code Playgroud)

456_base.txt

ABL2     97.81
ABO  100.00
ACACA    99.82
Run Code Online (Sandbox Code Playgroud)

期望的输出(制表符分隔)

Sample Percent
123    66.6
456    99.2
Run Code Online (Sandbox Code Playgroud)

巴什

for f in /home/cmccabe/Desktop/20x/percent/*.txt ; do
 bname=$(basename $f)
 pref=${bname%%_base_*.txt}
 awk -v OFS='\t' '{ sum += $2 } END { if (NR > 0) print sum / NR }' $f /home/cmccabe/Desktop/NGS/bed/bedtools/IDP_total_target_length_by_panel/IDP_unix_trim_total_target_length.bed > /home/cmccabe/Desktop/20x/coverage/${pref}_average.txt
done
Run Code Online (Sandbox Code Playgroud)

gle*_*man 5

这个使用GNU awk,它提供了方便BEGINFILEENDFILE事件:

gawk '
    BEGIN {print "Sample\tPercent"}
    BEGINFILE {sample = FILENAME; sub(/_.*/,"",sample); sum = n = 0}
    {sum += $2; n++}
    ENDFILE {printf "%s\t%.1f\n", sample, sum/n}
' 123_base.txt 456_base.txt 
Run Code Online (Sandbox Code Playgroud)

如果您给出了附加目录的模式,我会得到如下样本名称:

match(FILENAME, /^.*\/([^_]+)/, m); sample = m[1]
Run Code Online (Sandbox Code Playgroud)

然后,是的,这是好的: gawk '...' /path/to/*_base.txt

詹姆斯布朗答案的灵感来自于对零的偷窃:

ENDFILE {printf "%s\t%.1f\n", sample, n==0 ? 0 : sum/n}
Run Code Online (Sandbox Code Playgroud)