匹配文件之间的列并使用终端/powershell/命令行Bash中的数据组合生成文件

CEL*_*CEL 1 bash terminal powershell awk

我有两个不同长度的 .txt 文件,并希望执行以下操作:

如果文件 1 的第 1 列中的值出现在文件 3 的第 1 列中,则打印文件 2 的第 2 列,然后打印与文件 1 对应的整行。

已经尝试过 awk 的排列,但是到目前为止还没有成功!

谢谢!

文件 1:

MARKERNAME EA NEA BETA SE
10:1000706 T C -0.021786390809225 0.519667838651725
1:715265 G C 0.0310128798578049 0.0403763946716293
10:1002042 CCTT C 0.0337857775471699 0.0403300629299562
Run Code Online (Sandbox Code Playgroud)

文件2:

CHR:BP SNP  CHR BP  GENPOS  ALLELE1 ALLELE0 A1FREQ  INFO    
1:715265 rs12184267 1   715265  0.0039411   G   C   0.964671
1:715367 rs12184277 1   715367  0.00394384  A   G   0.964588
Run Code Online (Sandbox Code Playgroud)

所需文件 3:

SNP        MARKERNAME EA NEA BETA SE
rs12184267 1:715265 G C 0.0310128798578049 0.0403763946716293
Run Code Online (Sandbox Code Playgroud)

尝试:

awk -F'|' 'NR==FNR { a[$1]=1; next } ($1 in a) { print $3, $0 }' file1 file2
awk 'NR==FNR{A[$1]=$2;next}$0 in A{$0=A[$0]}1' file1 file2
Run Code Online (Sandbox Code Playgroud)

Rav*_*h13 5

使用您显示的样本,您能否尝试以下操作。

awk '
FNR==1{
  if(++count==1){ col=$0 }
  else{ print $2,col }
  next
}
FNR==NR{
  arr[$1]=$0
  next
}
($1 in arr){
  print $2,arr[$1]
}
' file1 file2
Run Code Online (Sandbox Code Playgroud)

说明:为以上添加详细说明。

awk '                              ##Starting awk program from here.
FNR==1{                            ##Checking condition if this is first line of file(s).
  if(++count==1){ col=$0 }         ##Checking if count is 1 then set col as current line.
  else{ print $2,col }             ##Checking if above is not true then print 2nd field and col here.
  next                             ##next will skip all further statements from here.
}
FNR==NR{                           ##This will be TRUE when file1 is being read.
  arr[$1]=$0                       ##Creating arr with 1st field index and value is current line.
  next                             ##next will skip all further statements from here.
}
($1 in arr){                       ##Checking condition if 1st field present in arr then do following.
  print $2,arr[$1]                 ##Printing 2nd field, arr value here.
}
' file1 file2                      ##Mentioning Input_files name here.
Run Code Online (Sandbox Code Playgroud)