从卷曲结果中提取特定字符串

Question

从卷曲结果中提取特定字符串

use*_*579 5 bash grep

给出以下curl命令：curl --user-agent "fogent" --silent -o page.html " http://www.google.com/search?q=insansiate "

* 拼写是故意不正确的。我想把这个建议作为我的结果。

我希望能够使用 grep -oE 来 grep 到 page.html 文件中，或者直接从curl 通过管道传输它并且从不存储文件。

结果应该是：“实例化”

我只需要“实例化”这个词，或者这个短语，无论谷歌自动更正什么，就是我所追求的。

这是返回的基本 html：

<span class=spell style="color:#cc0000">Did you mean: </span><a href="/search?hl=en&amp;ie=UTF-8&amp;&amp;sa=X&amp;ei=VEMUTMDqGoOINraK3NwL&amp;ved=0CB0QBSgA&amp;q=instantiate&amp;spell=1"class=spell><b><i>instantiate</i></b></a>&nbsp;&nbsp;<span class=std>Top 2 results shown</span>

Run Code Online (Sandbox Code Playgroud)

所以也许是下面字符串的 from/to，我希望它足够独特以涵盖我的所有基础。

class=spell><b><i>instantiate</i></b></a>&nbsp;&nbsp;

Run Code Online (Sandbox Code Playgroud)

我不断遇到贪婪 grep 的问题；也许我应该首先通过 html 美化工具运行它，以在其中获得换行符或 50 。我不知道在 bash 中有什么简单的方法可以做到这一点，这正是我理想的情况。我真的不想处理启动 perl 并确保我拥有正确的模块。

有什么建议吗，谢谢？

Answer 1

Ign*_*ams 0

curl--> tidy -asxml-->xmlstarlet sel

归档时间：	15 年，9 月前
查看次数：	17477 次
最近记录：	11 年，7 月前