比较文件 B 与 A 并使用 awk 、sed 或 grep 从 A 中提取数据

Rhe*_*hea 3 grep awk

我有两个文件文件 A,其中包含所有数据,而另一个文件 B 只有 ID,我想要将文件 B 与文件 A 进行比较并检索该 ID 中存在的数据。我正在使用 Suse Linux。

文件A

C    02020 Two-component system [PATH:aap02020]
D      NT05HA_1798 sensor protein CpxA  
D      NT05HA_1797 CpxR K07662 cpxR
C    02030 *Bacterial chemotaxis* [PATH:aap02030]
D      NT05HA_0919 maltose-binding periplasmic protein
D      NT05HA_0918 maltose-binding periplasmic protein 
C    03070 *Bacterial secretion system* [PATH:aap03070]
D      NT05HA_1309 protein-export membrane protein SecD 
D      NT05HA_1310 protein-export membrane protein SecF 
D      NT05HA_1819 preprotein translocase subunit SecE
D      NT05HA_1287 protein-export membrane protein  
C    02060 Phosphotransferase system (PTS) [PATH:aap02060]
D      NT05HA_0618 phosphoenolpyruvate-protein 
D      NT05HA_0617 phosphocarrier protein HPr 
D      NT05HA_0619 pts system 
Run Code Online (Sandbox Code Playgroud)

文件B

Bacterial chemotaxis
Bacterial secretion system
Run Code Online (Sandbox Code Playgroud)

期望输出:

C    02030 *Bacterial chemotaxis* [PATH:aap02030]
D      NT05HA_0919 maltose-binding periplasmic protein
D      NT05HA_0918 maltose-binding periplasmic protein 
C    03070 *Bacterial secretion system* [PATH:aap03070]
D      NT05HA_1309 protein-export membrane protein SecD
D      NT05HA_1310 protein-export membrane protein SecF
D      NT05HA_1819 preprotein translocase subunit SecE  
D      NT05HA_1287 protein-export membrane protein  
Run Code Online (Sandbox Code Playgroud)

oli*_*liv 5

你可以使用awk

awk 'NR==FNR{         # On the first file,
       a[$0];         # store the content in the array a
       next
     } 
     {                        # On the second file, 
         for(i in a)          # for all element in the array a,
            if(index($0,i)) { # check if there is match in the current record
               print "C" $0   # in that case print it with the record separator
               next
            }
     }' fileB RS='\nC' fileA
C    02030 *Bacterial chemotaxis* [PATH:aap02030]
D      NT05HA_0919 maltose-binding periplasmic protein
D      NT05HA_0918 maltose-binding periplasmic protein 
C    03070 *Bacterial secretion system* [PATH:aap03070]
D      NT05HA_1309 protein-export membrane protein SecD 
D      NT05HA_1310 protein-export membrane protein SecF 
D      NT05HA_1819 preprotein translocase subunit SecE
D      NT05HA_1287 protein-export membrane protein  
Run Code Online (Sandbox Code Playgroud)