剥离标签及其间的所有内容

Question

剥离标签及其间的所有内容

And*_*ndy 14 php

怎么我脱衣服 <h1>including this content</h1>

我知道你可以使用条带标签来删除标签,但我希望它们之间的所有内容都消失了.

任何帮助,将不胜感激.

Answer 1

Gum*_*mbo 23

在处理HTML时,您应该使用HTML解析器来正确处理它.您可以使用PHP的DOMDocument并使用DOMXPath查询元素,例如:

$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
foreach ($xpath->query('//h1') as $node) {
    $node->parentNode->removeChild($node);
}
$html = $doc->saveHTML();

Run Code Online (Sandbox Code Playgroud)

+1在这里使用解析器.做一次,当您(或其他开发人员或使用WYSIYWG编辑器的客户端)使正则表达式无效时,不必重新访问. (3认同)
+1做得很好。但是，我可以添加多个标签搜索吗？类似于$ xpath-> query（'// h1 // script // div'）`？ (2认同)
由于 5.5 年没有人回复@asprin，我想我会的。要查询多个标签，只需使用您熟悉的 OR 运算符。这意味着你的代码看起来像`$xpath->query('//h1|//script|//div')` (2认同)

Answer 2

Sar*_*raz 8

试试这个:

preg_replace('/<h1[^>]*>([\s\S]*?)<\/h1[^>]*>/', '', '<h1>including this content</h1>');

Run Code Online (Sandbox Code Playgroud)

例:

echo preg_replace('/<h1[^>]*>([\s\S]*?)<\/h1[^>]*>/', '', 'Hello<h1>including this content</h1> There !!');

Run Code Online (Sandbox Code Playgroud)

输出:

Hello There

Run Code Online (Sandbox Code Playgroud)

HTML允许在属性值中使用简单的`>`. (5认同)

Answer 3

mač*_*ček 7

如果要删除所有标记并包含内容:

$yourString = 'Hello <div>Planet</div> Earth. This is some <span class="foo">sample</span> content!';
$regex = '/<[^>]*>[^<]*<[^>]*>/';
echo preg_replace($regex, '', $yourString);
#=> Hello  Earth. This is some  content!

Run Code Online (Sandbox Code Playgroud)

HTML属性可以包含<或>.所以,如果你的HTML太乱了,这个方法就不行了,你需要一个DOM解析器.

正则表达式解释

NODE                     EXPLANATION
--------------------------------------------------------------------------------
  <                        '<'
--------------------------------------------------------------------------------
  [^>]*                    any character except: '>' (0 or more times
                           (matching the most amount possible))
--------------------------------------------------------------------------------
  >                        '>'
--------------------------------------------------------------------------------
  [^<]*                    any character except: '<' (0 or more times
                           (matching the most amount possible))
--------------------------------------------------------------------------------
  <                        '<'
--------------------------------------------------------------------------------
  [^>]*                    any character except: '>' (0 or more times
                           (matching the most amount possible))
--------------------------------------------------------------------------------
  >                        '>'

Run Code Online (Sandbox Code Playgroud)

HTML允许在属性值中使用简单的`>`. (3认同)

归档时间：	15 年，10 月前
查看次数：	13208 次
最近记录：	15 年，10 月前