我有这个文件:
a=1 b=2 1234j12342134h d="a v" id="y_123456" something else
a=1 b=2 1234j123421341 d="a" something else
a=1 b=2 1234j123421342 d="a D v id=" id="y_123458" something else
a=1 b=2 1234j123421344 d="a v" something else
a=1 b=2 1234j123421346 d="a.a." id="y_123410" something else
Run Code Online (Sandbox Code Playgroud)
我想只检索包含'id ='的行,只检索id和第3列的值.最终的产品应该是
1234j12342134h id="y_123456"
1234j123421342 id="y_123458"
1234j123421346 id="y_123410"
Run Code Online (Sandbox Code Playgroud)
要么
1234j12342134h "y_123456"
1234j123421342 "y_123458"
1234j123421346 "y_123410"
Run Code Online (Sandbox Code Playgroud)
甚至
1234j12342134h y_123456
1234j123421342 y_123458
1234j123421346 y_123410
Run Code Online (Sandbox Code Playgroud)
我尝试了grep -o表达式的开头和结尾,但是错过了第一块ID.我试过awk,但是对于带空格的列来说失败了.
我使用Java,但随着日志文件变大,速度很慢.
我怎么能用bash实用程序呢?
使用GNU awk(用于match()的第3个arg):
$ gawk 'match($0,/id="[^" ]+"/,a){ print $3, a[0] }' file
1234j12342134h id="y_123456"
1234j123421342 id="y_123458"
1234j123421346 id="y_123410"
Run Code Online (Sandbox Code Playgroud)
与其他问题:
$ awk 'match($0,/id="[^" ]+"/){ print $3, substr($0,RSTART,RLENGTH) }' file
1234j12342134h id="y_123456"
1234j123421342 id="y_123458"
1234j123421346 id="y_123410"
Run Code Online (Sandbox Code Playgroud)
或者如果你想剥离一些前导/尾随字符,有两种方法:
$ gawk 'match($0,/id="([^" ]+)"/,a){ print $3, a[1] }' file
1234j12342134h y_123456
1234j123421342 y_123458
1234j123421346 y_123410
Run Code Online (Sandbox Code Playgroud)
要么:
$ awk 'match($0,/id="[^" ]+"/){ print $3, substr($0,RSTART+4,RLENGTH-5) }' file
1234j12342134h y_123456
1234j123421342 y_123458
1234j123421346 y_123410
Run Code Online (Sandbox Code Playgroud)