我需要在文本中使用带有正则表达式的"a"标记包装文本中的所有链接,除了已经包装的那些
所以我有文字:
Some text with html here
http://www.somelink.html
http://www.somelink.com/view/?id=95
<a href="http://anotherlink.html">http://anotherlink.html</a>
<a href="http://anotherlink.html">Title</a>
我需要得到什么:
Some text with html here
<a href="http://www.somelink.html">http://www.somelink.html</a>
<a href="http://www.somelink.com/view/?id=2495">http://www.somelink.com/view/?id=95</a>
<a href="http://anotherlink.html">http://anotherlink.html</a>
<a href="http://anotherlink.html">Title</a>
我可以使用此表达式匹配链接:
(?:(?:https?|ftp):\/\/|www.)[-a-z0-9+&@#\/%?=~_|!:,.;]*[-a-z0-9+&@#\/%=~_|]
但它也匹配已经在"a"标签中的thouse
为了可靠性,我会拆分<a>标签(包括子内容)和其他标签(不包括子内容),例如:
$bits = preg_split('/(<a(?:\s+[^>]*)?>.*?<\/a>|<[a-z][^>]*>)/is', $content, null, PREG_SPLIT_DELIM_CAPTURE);
$reconstructed = '';
foreach ($bits as $bit) {
if (strpos($bit, '<') !== 0) {//not inside an <a> or within < and > so check for urls
$bit = link_urls($bit);
}
$reconstructed .= $bit;
}
Run Code Online (Sandbox Code Playgroud)