use*_*340 9 awk text-processing
我阅读了使用 Unix 和 Awk 比较两个文件。这真的很有趣。我阅读并测试了它,但我无法完全理解它并在其他情况下使用它。
我有两个文件。file1
有一个字段,另一个有 16 个字段。我想读取file
1 的元素并将它们与file2
. 如果每个元素都匹配,我将字段 5 的值相加file2
。举个例子:
1
2
3
Run Code Online (Sandbox Code Playgroud)
2 2 2 1 2
3 6 1 2 4
4 1 1 2 3
6 3 3 3 4
Run Code Online (Sandbox Code Playgroud)
对于中的元素 1,file1
我想在字段file2
3 的值为 1 的字段 5 中添加值。并对 中的元素 2 和 3 执行相同操作file1
。1 的输出是 (3+4=7),2 的输出是 2,3 的输出是 4。
我不知道我应该如何用 awk 编写它。
ter*_*don 22
这是一种方法。我已经把它写成一个 awk 脚本,所以我可以添加评论:
#!/usr/local/bin/awk -f
{
## FNR is the line number of the current file, NR is the number of
## lines that have been processed. If you only give one file to
## awk, FNR will always equal NR. If you give more than one file,
## FNR will go back to 1 when the next file is reached but NR
## will continue incrementing. Therefore, NR == FNR only while
## the first file is being processed.
if(NR == FNR){
## If this is the first file, save the values of $1
## in the array n.
n[$1] = 0
}
## If we have moved on to the 2nd file
else{
## If the 3rd field of the second file exists in
## the first file.
if($3 in n){
## Add the value of the 5th field to the corresponding value
## of the n array.
n[$3]+=$5
}
}
}
## The END{} block is executed after all files have been processed.
## This is useful since you may have more than one line whose 3rd
## field was specified in the first file so you don't want to print
## as you process the files.
END{
## For each element in the n array
for (i in n){
## print the element itself and then its value
print i,":",n[i];
}
}
Run Code Online (Sandbox Code Playgroud)
您可以将其保存为文件,使其可执行并像这样运行它:
$ chmod a+x foo.awk
$ ./foo.awk file1 file2
1 : 7
2 : 2
3 : 4
Run Code Online (Sandbox Code Playgroud)
或者,您可以将其压缩为单行:
awk '
(NR == FNR){n[$1] = 0; next}
{if($3 in n){n[$3]+=$5}}
END{for (i in n){print i,":",n[i]} }' file1 file2
Run Code Online (Sandbox Code Playgroud)
awk '
NR == FNR {n[$3] += $5; next}
{print $1 ": " n[$1]}' file2 file1
Run Code Online (Sandbox Code Playgroud)