grep 忽略模式

Question

grep 忽略模式

我正在使用 cURL 从网站中提取 URL，如下所示。

curl www.somesite.com | grep "<a href=.*title=" > new.txt

Run Code Online (Sandbox Code Playgroud)

我的 new.txt 文件如下。

<a href="http://website1.com" title="something">
<a href="http://website1.com" information="something" title="something">
<a href="http://website2.com" title="some_other_thing">
<a href="http://website2.com" information="something" title="something">
<a href="http://websitenotneeded.com" title="something NOTNEEDED">

Run Code Online (Sandbox Code Playgroud)

但是，我只需要提取以下信息。

<a href="http://website1.com" title="something">
<a href="http://website2.com" information="something" title="something">

Run Code Online (Sandbox Code Playgroud)

我试图忽略<a href其中包含信息且标题以NOTNEEDED结尾的内容。

如何修改我的 grep 语句？

Answer 1

slm*_*slm 18

我没有完全按照你的例子+描述，但听起来你想要的是这个：

$ grep -v "<a href=.*title=.*NOTNEEDED" sample.txt 
<a href="http://website1.com" title="something">
<a href="http://website1.com" information="something" title="something">
<a href="http://website2.com" title="some_other_thing">
<a href="http://website2.com" information="something" title="something">

Run Code Online (Sandbox Code Playgroud)

所以对于你的例子：

$ curl www.example.com | grep -v "<a href=.*title=" | grep -v NOTNEEDED > new.txt

Run Code Online (Sandbox Code Playgroud)

Answer 2

小智 11

在grep的手册页说：

-v, --invert-match
    Invert the sense of matching, to select non-matching lines. (-v is specified by POSIX .)

Run Code Online (Sandbox Code Playgroud)

您可以使用正则表达式进行多次反转：

grep -v 'red\|green\|blue'

Run Code Online (Sandbox Code Playgroud)

或者

grep -v red | grep -v green | grep -v blue

Run Code Online (Sandbox Code Playgroud)

归档时间：	11 年，7 月前
查看次数：	59442 次
最近记录：	7 年，8 月前