为什么我的PHP正则表达式会解析Markdown链接？

Question

为什么我的PHP正则表达式会解析Markdown链接？

$pattern = "/\[(.*?)\]\((.*?)\)/i";
$replace = "<a href=\"$2\" rel=\"nofollow\">$1</a>";
$text = "blah blah [LINK1](http://example.com) blah [LINK2](http://sub.example.com/) blah blah ?";
echo preg_replace($pattern, $replace, $text);

Run Code Online (Sandbox Code Playgroud)

上面的工作,但如果在[]和()之间意外插入一个空格,一切都会中断,两个链接混合成一个:

$text = "blah blah [LINK1] (http://example.com) blah [LINK2](http://sub.example.com/) blah blah ?";

Run Code Online (Sandbox Code Playgroud)

我有一种感觉,这是一个松散的明星打破它,但不知道如何匹配重复链接.

Answer 1

Jar*_*mex 7

如果我理解你的话,你真正需要做的就是匹配两者之间的任意数量的空格,例如:

/\[([^]]*)\] *\(([^)]*)\)/i

Run Code Online (Sandbox Code Playgroud)

说明:

\[             # Matches the opening square bracket (escaped)
([^]]*)        # Captures any number of characters that aren't close square brackets
\]             # Match close square bracket (escaped)
 *             # Match any number of spaces
\(             # Match the opening bracket (escaped)
([^)]*)        # Captures any number of characters that aren't close brackets
\)             # Match the close bracket (escaped)

Run Code Online (Sandbox Code Playgroud)

理由:

我也许应该辩解说,之所以我改变了你的.*?成[^]]*

第二个版本效率更高,因为它不需要进行大量的回溯.*?.此外,一旦[遇到开口,.*? 版本将继续查找,直到找到匹配,而不是失败,如果它不是我们想要的标签.例如,如果我们使用.*?对照表达式匹配:

Sad face :[ blah [LINK1](http://sub.example.com/) blah

Run Code Online (Sandbox Code Playgroud)

它会匹配

[ blah [LINK1]

Run Code Online (Sandbox Code Playgroud)

和

http://sub.example.com/

Run Code Online (Sandbox Code Playgroud)

使用该[^]]*方法将意味着输入正确匹配.

归档时间：	13 年，6 月前
查看次数：	881 次
最近记录：	13 年，6 月前