Regex/DOMDocument - 匹配和替换不在链接中的文本

Bry*_*ynJ 12 php regex xpath preg-replace domdocument

我需要以不区分大小写的方式查找和替换所有文本匹配项,除非文本位于锚标记内 - 例如:

<p>Match this text and replace it</p>
<p>Don't <a href="/">match this text</a></p>
<p>We still need to match this text and replace it</p>
Run Code Online (Sandbox Code Playgroud)

搜索"匹配此文本"只会替换第一个实例和最后一个实例.

[编辑]根据戈登的评论,在这个例子中可能更喜欢使用DOMDocument.我对DOMDocument扩展并不熟悉,并且非常感谢这个功能的一些基本示例.

Ist*_*ros 17

这是一个UTF-8安全解决方案,它不仅适用于格式正确的文档,还适用于文档片段.

需要mb_convert_encoding,因为loadHtml()似乎有一个UTF-8编码的错误(参见此处此处).

mb_substr正在从输出中修剪body标记,这样您就可以在不添加任何其他标记的情况下获取原始内容.

<?php
$html = '<p>Match this text and replace it</p>
<p>Don\'t <a href="/">match this text</a></p>
<p>We still need to match this text and replace it??</p>
<p>This is <a href="#">a link <span>with <strong>don\'t match this text</strong> content</span></a></p>';

$dom = new DOMDocument();
// loadXml needs properly formatted documents, so it's better to use loadHtml, but it needs a hack to properly handle UTF-8 encoding
$dom->loadHtml(mb_convert_encoding($html, 'HTML-ENTITIES', "UTF-8"));

$xpath = new DOMXPath($dom);

foreach($xpath->query('//text()[not(ancestor::a)]') as $node)
{
    $replaced = str_ireplace('match this text', 'MATCH', $node->wholeText);
    $newNode  = $dom->createDocumentFragment();
    $newNode->appendXML($replaced);
    $node->parentNode->replaceChild($newNode, $node);
}

// get only the body tag with its contents, then trim the body tag itself to get only the original content
echo mb_substr($dom->saveXML($xpath->query('//body')->item(0)), 6, -7, "UTF-8");
Run Code Online (Sandbox Code Playgroud)

参考文献:
1.通过html片段中的超链接查找和替换关键字,通过php dom
2. Regex/DOMDocument - 匹配和替换不在链接中的文本
3. php问题与俄语
4.为什么DOM更改编码?

我在这个主题上阅读了几十个答案,所以如果我忘了某人,我很抱歉(请评论一下,我也会在这个案例中添加你的).

感谢Gordon并且仍然对我的其他答案发表评论.


net*_*der 6

试试这个:

$dom = new DOMDocument;
$dom->loadHTML($html_content);

function preg_replace_dom($regex, $replacement, DOMNode $dom, array $excludeParents = array()) {
  if (!empty($dom->childNodes)) {
    foreach ($dom->childNodes as $node) {
      if ($node instanceof DOMText && 
          !in_array($node->parentNode->nodeName, $excludeParents)) 
      {
        $node->nodeValue = preg_replace($regex, $replacement, $node->nodeValue);
      } 
      else
      {
        preg_replace_dom($regex, $replacement, $node, $excludeParents);
      }
    }
  }
}

preg_replace_dom('/match this text/i', 'IT WORKS', $dom->documentElement, array('a'));
Run Code Online (Sandbox Code Playgroud)