如何正则表达式匹配不同结尾的文本？

Question

如何正则表达式匹配不同结尾的文本？

这就是我现在所拥有的.

<h2>Information</h2>\n  +<p>(.*)<br />|</p>
                  ^ that is a tab space, didn't know if there was
 a better way to represent one or more (it seems to work)

Run Code Online (Sandbox Code Playgroud)

我试图匹配'bla bla'.文本,但我当前的正则表达式不太有效,它将匹配大部分行,但我希望它匹配第一行

<h2>Information</h2>
    <p>bla bla.<br /><br /><a href="http://www.google.com">google</a><br />

Run Code Online (Sandbox Code Playgroud)

要么

<h2>Information</h2>
    <p>bla bla.</p> other code...

Run Code Online (Sandbox Code Playgroud)

哦,我的PHP代码:

    preg_match('#h2>Information</h2>\n  +<p>(.*)<br />|</p>#', $result, $postMessage);

Run Code Online (Sandbox Code Playgroud)

Answer 1

Mar*_*ers 6

不要使用正则表达式来解析HTML.PHP提供了可用于此目的的DOMDocument.

说你的正则表达式有一些错误:

您需要围绕轮换进行括号.
你需要懒惰的修饰符.
您无法输入"标题"以匹配"信息".

通过这些更改,它看起来像这样:

<h2>.*?</h2>\n\t+<p>.*?(<br />|</p>)

Run Code Online (Sandbox Code Playgroud)

你的正则表达式也非常脆弱.例如,如果输入包含空格而不是制表符,或者行结尾是Windows样式,则正则表达式将失败.使用适当的HTML解析器将提供更强大的解决方案.

+1表示不使用正则表达式.有关详细信息,请参阅http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags. (4认同)

归档时间：	15 年，11 月前
查看次数：	280 次
最近记录：	13 年，2 月前