RegEx:匹配不在HTML标记内部和部分HTML标记的文本

Question

RegEx:匹配不在HTML标记内部和部分HTML标记的文本

如何匹配HTML标记之外的所有内容？

我的伪HTML是:

<h1>aaa</h1>
bbb <img src="bla" /> ccc
<div>ddd</div>

Run Code Online (Sandbox Code Playgroud)

我用了正则表达式,

(?<=^|>)[^><]+?(?=<|$)

Run Code Online (Sandbox Code Playgroud)

哪会给我:"aaa bbb ccc ddd"

我只需要一种忽略HTML标签的方法:"bbb ccc"

Answer 1

kar*_*m79 6

正则表达式是一种笨重且不可靠的标记工作方式.我建议使用DOM解析器,如SimpleHtmlDom:

//get the textual content of all hyperlinks on specified page.
//you can use selectors, e.g. 'a.pretty' - see the docs
echo file_get_html('http://www.example.org')->find('a')->plaintext;

Run Code Online (Sandbox Code Playgroud)

如果你想在客户端上这样做,你可以使用像jQuery这样的库:

$('a').each(function() {
    alert($(this).text());
});

Run Code Online (Sandbox Code Playgroud)

归档时间：	16 年，8 月前
查看次数：	2242 次
最近记录：	14 年，5 月前