Ste*_*ell 3 regex grep protein-database
我确信这已被问到但我找不到,所以我对冗余道歉.
我想使用grep或egrep来查找其中包含"P"或"CA"的每一行,并将它们传递给新文件.我可以使用以下方法轻松完成:
egrep ' CA ' all.pdb > CA.pdb
Run Code Online (Sandbox Code Playgroud)
要么
egrep ' P ' all.pdb > P.pdb
Run Code Online (Sandbox Code Playgroud)
我是regex的新手,所以我不确定它的语法or.
更新: 输出行的顺序很重要,即我不希望输出对匹配的字符串排序.以下是一个文件的前8行示例:
ATOM 1 N THR U 27 -68.535 88.128 -17.857 1.00 0.00 1H5 N
ATOM 2 HT1 THR U 27 -69.437 88.216 -17.434 0.00 0.00 1H5 H
ATOM 3 HT2 THR U 27 -68.270 87.165 -17.902 0.00 0.00 1H5 H
ATOM 4 HT3 THR U 27 -68.551 88.520 -18.777 0.00 0.00 1H5 H
ATOM 5 CA LYS B 122 -116.643 85.931-103.890 1.00 0.00 2H2B C
ATOM 6 P THY J 2 -73.656 70.884 -7.805 1.00 0.00 DNA2 P
ATOM 8 HB THR U 27 -68.543 88.566 -15.171 0.00 0.00 1H5 H
ATOM 9 CA LYS B 122 -116.643 85.931-103.890 1.00 0.00 2H2B C
ATOM 10 P THY J 2 -73.656 70.884 -7.805 1.00 0.00 DNA2 P
ATOM 11 HB THR U 27 -68.543 88.566 -15.171 0.00 0.00 1H5 H
ATOM 12 C SER D 2 -73.656 70.884 -7.805 1.00 0.00 DNA2 C
ATOM 13 OP1 SER D 2 -73.656 70.884 -7.805 1.00 0.00 DNA2 O
Run Code Online (Sandbox Code Playgroud)
我希望这个例子的结果文件是:
ATOM 5 CA LYS B 122 -116.643 85.931-103.890 1.00 0.00 2H2B C
ATOM 6 P THY J 2 -73.656 70.884 -7.805 1.00 0.00 DNA2 P
ATOM 9 CA LYS B 122 -116.643 85.931-103.890 1.00 0.00 2H2B C
ATOM 10 P THY J 2 -73.656 70.884 -7.805 1.00 0.00 DNA2 P
Run Code Online (Sandbox Code Playgroud)
fed*_*qui 10
你可以grep像这样使用:
grep ' P \| CA ' file > new_file
Run Code Online (Sandbox Code Playgroud)
的|表达表示"或".我们必须逃避它,以告诉grep它具有特殊意义.
你可以避免这种逃避,并使用扩展的东西grep:
grep -E ' (P|CA) ' file > new_file
Run Code Online (Sandbox Code Playgroud)
一般来说,我更喜欢awk语法,因为它更清晰,更容易扩展:
awk '/ P / || / CA /' file
Run Code Online (Sandbox Code Playgroud)
或者给出您的样本输入,您可以使用awk它来检查它是否在第3列中发生:
$ awk '$3=="CA" || $3=="P"' file
ATOM 5 CA LYS B 122 -116.643 85.931-103.890 1.00 0.00 2H2B C
ATOM 6 P THY J 2 -73.656 70.884 -7.805 1.00 0.00 DNA2 P
ATOM 9 CA LYS B 122 -116.643 85.931-103.890 1.00 0.00 2H2B C
ATOM 10 P THY J 2 -73.656 70.884 -7.805 1.00 0.00 DNA2 P
Run Code Online (Sandbox Code Playgroud)
$ cat file
hello P is here and CA also
but CA appears
nothing here
P CA
$ grep ' P \| CA ' file
hello P is here and CA also
but CA appears
$ grep -E ' (P|CA) ' file
hello P is here and CA also
but CA appears
$ awk '/ P / || / CA /' file
hello P is here and CA also
but CA appears
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
3680 次 |
| 最近记录: |