我试图计算$2
目录中多个测试文件的平均值,并将输出合并到一个tab-delimeted
输出文件中.输出文件是两个字段,其中$1
包含已提取的文件名pref
,以及$2" is the calculated average with one decimal, rounded up. There is also a header in the output
Sample in
$ 1 and
Percent in
$ 2`.下面似乎很接近,但我缺少一些东西(将标题添加到输出,合并到一个制表符分隔文件,并舍入到3个小数位),我不知道该怎么做而没有得到所需的输出.谢谢 :).
123_base.txt
AASS 99.81
ABAT 100.00
ABCA10 0.0
Run Code Online (Sandbox Code Playgroud)
456_base.txt
ABL2 97.81
ABO 100.00
ACACA 99.82
Run Code Online (Sandbox Code Playgroud)
期望的输出(制表符分隔)
Sample Percent
123 66.6
456 99.2
Run Code Online (Sandbox Code Playgroud)
巴什
for f in /home/cmccabe/Desktop/20x/percent/*.txt ; do
bname=$(basename $f)
pref=${bname%%_base_*.txt}
awk -v OFS='\t' '{ sum += $2 } END { if (NR > 0) print sum / NR }' $f /home/cmccabe/Desktop/NGS/bed/bedtools/IDP_total_target_length_by_panel/IDP_unix_trim_total_target_length.bed > /home/cmccabe/Desktop/20x/coverage/${pref}_average.txt
done
Run Code Online (Sandbox Code Playgroud)
这个使用GNU awk,它提供了方便BEGINFILE
和ENDFILE
事件:
gawk '
BEGIN {print "Sample\tPercent"}
BEGINFILE {sample = FILENAME; sub(/_.*/,"",sample); sum = n = 0}
{sum += $2; n++}
ENDFILE {printf "%s\t%.1f\n", sample, sum/n}
' 123_base.txt 456_base.txt
Run Code Online (Sandbox Code Playgroud)
如果您给出了附加目录的模式,我会得到如下样本名称:
match(FILENAME, /^.*\/([^_]+)/, m); sample = m[1]
Run Code Online (Sandbox Code Playgroud)
然后,是的,这是好的: gawk '...' /path/to/*_base.txt
詹姆斯布朗答案的灵感来自于对零的偷窃:
ENDFILE {printf "%s\t%.1f\n", sample, n==0 ? 0 : sum/n}
Run Code Online (Sandbox Code Playgroud)