从每一行中提取特定字符

Question

从每一行中提取特定字符

我有一个文本文件，我想从后面的每一行中提取字符串 "OS="

input file line
A0A0A9PBI3_ARUDO Uncharacterized protein OS=Arundo donax OX=35708 PE=4 SV=1
K3Y356_SETIT ATP-dependent DNA helicase OS=Setaria italica OX=4555 PE=3 SV=1

Run Code Online (Sandbox Code Playgroud)

所需的输出

OS=Arundo donax
OS=Setaria italica

Run Code Online (Sandbox Code Playgroud)

或者

Arundo donax
Setaria italica

Run Code Online (Sandbox Code Playgroud)

Answer 1

pLu*_*umo 7

使用grep带有扩展正则表达式的GNU （或兼容）：

grep -Eo "OS=\w+ \w+" file

Run Code Online (Sandbox Code Playgroud)

或基本的正则表达式（你需要转义 +

grep -o "OS=\w\+ \w\+" file
# or
grep -o "OS=\w* \w*" file

Run Code Online (Sandbox Code Playgroud)

要获得所有内容OS=，OX=您可以使用grep与 perl 兼容的正则表达式（PCRE）（-P选项）（如果可用）并进行前瞻：

grep -Po "OS=.*(?=OX=)" file

#to also leave out "OS="
#use lookbehind
grep -Po "(?<=OS=).*(?=OX=)" file
#or Keep-out \K
grep -Po "OS=\K.*(?=OX=)" file

Run Code Online (Sandbox Code Playgroud)

或使用grep包括OX=并在sed之后删除它：

grep -o "OS=.*\( OX=\)" file | sed 's/ OX=$//'

Run Code Online (Sandbox Code Playgroud)

输出：

OS=Arundo donax
OS=Setaria italica

Run Code Online (Sandbox Code Playgroud)

归档时间：	6 年，2 月前
查看次数：	667 次
最近记录：	6 年，2 月前