检查两个文件是否在 2 列值中匹配并将这些行打印到新的输出文件

kll*_*rdr 0 shell awk

我想根据每个文件的两个列值匹配两个文件。如果“BP”和“P”的值在同一行中匹配,我想将这些行打印在第三个文件上,就像文件 2。

文件 1:

CHR BP BETA SE P PHENOTYPE FDR CATEGORY SNP
10 110408937 3.386e+00 1.333e+00 1.112e-02 1 1 Medication rs113627704
10 110408937 4.409e+00 1.623e+00 6.602e-03 2 1 Cardiovascular rs113627704
10 110408937 2.382e+00 1.124e+00 3.414e-02 3 1 Medication rs113627704
Run Code Online (Sandbox Code Playgroud)

文件2:

CHR F SNP BP P TOTAL
10 1 rs113627704 110408937 1.112e-02 456
4 1 rs43567 2345677 0.045457 567
3 1 rs567899 479899 0.3456 223
Run Code Online (Sandbox Code Playgroud)

期望的输出:

CHR BP BETA SE P PHENOTYPE FDR CATEGORY SNP
10 110408937 3.386e+00 1.333e+00 1.112e-02 1 1 Medication rs113627704
Run Code Online (Sandbox Code Playgroud)

我尝试了以下两个:

?awk 'FNR==NR{a[$4,$5]=$0;next}{if(b=a[$2,$5]){print b}}' file1 file2 > file3
Run Code Online (Sandbox Code Playgroud)

在这里我收到错误“bash:awk:找不到命令”。我一直使用 awk,它总是有效。

awk 'FNR==NR {a[$4,$5]=$0; next} ($4,$5) in a {print a[$2,$5], $0}' file1 file2 > file3
Run Code Online (Sandbox Code Playgroud)

在这里我得到一个空文件。

Jam*_*own 5

这应该有效:

$ awk 'NR==FNR{a[$4,$5]=$0;next}(($2,$5) in a)' file2 file1
Run Code Online (Sandbox Code Playgroud)

输出:

CHR BP BETA SE P PHENOTYPE FDR CATEGORY SNP
10 110408937 3.386e+00 1.333e+00 1.112e-02 1 1 Medication rs113627704
Run Code Online (Sandbox Code Playgroud)

解释:

$ awk '
NR==FNR {         # process file2 as output we want are from file1
    a[$4,$5]=$0   # desired fields are 4th and 5th, use them as hash key
    next          # move to next record
}                 # process file1 below this point
(($2,$5) in a)    # test if 2nd and 5th in hash and output
' file2 file1     # mind the file order
Run Code Online (Sandbox Code Playgroud)