sha*_*zad 3 awk perl text-processing
我有一个文本文件,我想从后面的每一行中提取字符串 "OS="
input file line
A0A0A9PBI3_ARUDO Uncharacterized protein OS=Arundo donax OX=35708 PE=4 SV=1
K3Y356_SETIT ATP-dependent DNA helicase OS=Setaria italica OX=4555 PE=3 SV=1
Run Code Online (Sandbox Code Playgroud)
所需的输出
OS=Arundo donax
OS=Setaria italica
Run Code Online (Sandbox Code Playgroud)
或者
Arundo donax
Setaria italica
Run Code Online (Sandbox Code Playgroud)
使用grep带有扩展正则表达式的GNU (或兼容):
grep -Eo "OS=\w+ \w+" file
Run Code Online (Sandbox Code Playgroud)
或基本的正则表达式(你需要转义 +
grep -o "OS=\w\+ \w\+" file
# or
grep -o "OS=\w* \w*" file
Run Code Online (Sandbox Code Playgroud)
要获得所有内容OS=,OX=您可以使用grep与 perl 兼容的正则表达式(PCRE)(-P选项)(如果可用)并进行前瞻:
grep -Po "OS=.*(?=OX=)" file
#to also leave out "OS="
#use lookbehind
grep -Po "(?<=OS=).*(?=OX=)" file
#or Keep-out \K
grep -Po "OS=\K.*(?=OX=)" file
Run Code Online (Sandbox Code Playgroud)
或使用grep包括OX=并在sed之后删除它:
grep -o "OS=.*\( OX=\)" file | sed 's/ OX=$//'
Run Code Online (Sandbox Code Playgroud)
输出:
OS=Arundo donax
OS=Setaria italica
Run Code Online (Sandbox Code Playgroud)