使用grep查找两个字符串中的任何一个而不改变行的顺序?

Ste*_*ell 3 regex grep protein-database

我确信这已被问到但我找不到,所以我对冗余道歉.

我想使用grep或egrep来查找其中包含"P"或"CA"的每一行,并将它们传递给新文件.我可以使用以下方法轻松完成:

egrep ' CA ' all.pdb > CA.pdb
Run Code Online (Sandbox Code Playgroud)

要么

egrep ' P ' all.pdb > P.pdb
Run Code Online (Sandbox Code Playgroud)

我是regex的新手,所以我不确定它的语法or.

更新: 输出行的顺序很重要,即我不希望输出对匹配的字符串排序.以下是一个文件的前8行示例:

ATOM      1 N    THR U  27     -68.535  88.128 -17.857  1.00  0.00      1H5  N  
ATOM      2 HT1  THR U  27     -69.437  88.216 -17.434  0.00  0.00      1H5  H  
ATOM      3 HT2  THR U  27     -68.270  87.165 -17.902  0.00  0.00      1H5  H  
ATOM      4 HT3  THR U  27     -68.551  88.520 -18.777  0.00  0.00      1H5  H  
ATOM      5 CA   LYS B 122    -116.643  85.931-103.890  1.00  0.00      2H2B C  
ATOM      6 P    THY J   2     -73.656  70.884  -7.805  1.00  0.00      DNA2 P  
ATOM      8 HB   THR U  27     -68.543  88.566 -15.171  0.00  0.00      1H5  H  
ATOM      9 CA   LYS B 122    -116.643  85.931-103.890  1.00  0.00      2H2B C  
ATOM     10 P    THY J   2     -73.656  70.884  -7.805  1.00  0.00      DNA2 P  
ATOM     11 HB   THR U  27     -68.543  88.566 -15.171  0.00  0.00      1H5  H  
ATOM     12 C    SER D   2     -73.656  70.884  -7.805  1.00  0.00      DNA2 C  
ATOM     13 OP1  SER D   2     -73.656  70.884  -7.805  1.00  0.00      DNA2 O  
Run Code Online (Sandbox Code Playgroud)

我希望这个例子的结果文件是:

ATOM      5 CA   LYS B 122    -116.643  85.931-103.890  1.00  0.00      2H2B C  
ATOM      6 P    THY J   2     -73.656  70.884  -7.805  1.00  0.00      DNA2 P  
ATOM      9 CA   LYS B 122    -116.643  85.931-103.890  1.00  0.00      2H2B C  
ATOM     10 P    THY J   2     -73.656  70.884  -7.805  1.00  0.00      DNA2 P  
Run Code Online (Sandbox Code Playgroud)

fed*_*qui 10

你可以grep像这样使用:

grep ' P \| CA ' file > new_file
Run Code Online (Sandbox Code Playgroud)

|表达表示"或".我们必须逃避它,以告诉grep它具有特殊意义.

你可以避免这种逃避,并使用扩展的东西grep:

grep -E ' (P|CA) ' file > new_file
Run Code Online (Sandbox Code Playgroud)

一般来说,我更喜欢awk语法,因为它更清晰,更容易扩展:

awk '/ P / || / CA /' file
Run Code Online (Sandbox Code Playgroud)

或者给出您的样本输入,您可以使用awk它来检查它是否在第3列中发生:

$ awk '$3=="CA" || $3=="P"' file
ATOM      5 CA   LYS B 122    -116.643  85.931-103.890  1.00  0.00      2H2B C
ATOM      6 P    THY J   2     -73.656  70.884  -7.805  1.00  0.00      DNA2 P
ATOM      9 CA   LYS B 122    -116.643  85.931-103.890  1.00  0.00      2H2B C
ATOM     10 P    THY J   2     -73.656  70.884  -7.805  1.00  0.00      DNA2 P
Run Code Online (Sandbox Code Playgroud)

测试

$ cat file
hello P is here and CA also
but CA appears
nothing here
P CA
$ grep ' P \| CA ' file
hello P is here and CA also
but CA appears
$ grep -E ' (P|CA) ' file
hello P is here and CA also
but CA appears
$ awk '/ P / || / CA /' file
hello P is here and CA also
but CA appears
Run Code Online (Sandbox Code Playgroud)