从html标记中删除所有属性

Question

从html标记中删除所有属性

我有这个HTML代码:

<p style="padding:0px;">
<strong style="padding:0;margin:0;">hello</strong>
</p>

Run Code Online (Sandbox Code Playgroud)

但它应该成为(对于所有可能的html标签):

<p>
<strong>hello</strong>
</p>

Run Code Online (Sandbox Code Playgroud)

Answer 1

gna*_*arf 146

改编自我对类似问题的回答

$text = '<p style="padding:0px;"><strong style="padding:0;margin:0;">hello</strong></p>';

echo preg_replace("/<([a-z][a-z0-9]*)[^>]*?(\/?)>/i",'<$1$2>', $text);

// <p><strong>hello</strong></p>

Run Code Online (Sandbox Code Playgroud)

RegExp细分:

/              # Start Pattern
 <             # Match '<' at beginning of tags
 (             # Start Capture Group $1 - Tag Name
  [a-z]         # Match 'a' through 'z'
  [a-z0-9]*     # Match 'a' through 'z' or '0' through '9' zero or more times
 )             # End Capture Group
 [^>]*?        # Match anything other than '>', Zero or More times, not-greedy (wont eat the /)
 (\/?)         # Capture Group $2 - '/' if it is there
 >             # Match '>'
/i            # End Pattern - Case Insensitive

Run Code Online (Sandbox Code Playgroud)

添加一些引用,并使用替换文本,<$1$2>它应该删除标记名后面的任何文本,直到标记结束/>或只是>.

请注意这不一定适用于所有输入,因为Anti-HTML + RegExp会告诉您.有几回退,最值得注意的是<p style=">">最终将<p>">和其他一些破碎的问题......我会建议看Zend_Filter_StripTags在PHP作为更充分证明标签/属性过滤器

+999用于分解正则表达式! (58认同)
@Jleagle你认真吗？已经有一篇评论在答案中提到了在解析HTML时打破这个正则表达式的方法.有时用正则表达式解析HTML是很好的(就像HTML是由一些已知的系统生成的,因此非常规则.如果你要评论一些关于不用正则表达式解析HTML的东西 - 至少添加一些不是已经在答案中说明了. (7认同)
不应该在HTML上使用正则表达式 (5认同)
如果你知道标签，你可以做类似`$plain_value = preg_replace("/<(p|br)[^>]*?(\/?)>/i",'<$1>', $plain_value); ` (2认同)

Answer 2

Gor*_*don 66

以下是如何使用本机DOM执行此操作:

$dom = new DOMDocument;                 // init new DOMDocument
$dom->loadHTML($html);                  // load HTML into it
$xpath = new DOMXPath($dom);            // create a new XPath
$nodes = $xpath->query('//*[@style]');  // Find elements with a style attribute
foreach ($nodes as $node) {              // Iterate over found elements
    $node->removeAttribute('style');    // Remove style attribute
}
echo $dom->saveHTML();                  // output cleaned HTML

Run Code Online (Sandbox Code Playgroud)

如果要从所有可能的标记中删除所有可能的属性,请执行

$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query('//@*');
foreach ($nodes as $node) {
    $node->parentNode->removeAttribute($node->nodeName);
}
echo $dom->saveHTML();

Run Code Online (Sandbox Code Playgroud)

Answer 3

Yac*_*oby 9

我会避免使用正则表达式,因为HTML不是常规语言,而是使用像Simple HTML DOM这样的html解析器

您可以使用该对象获取的属性列表attr.例如:

$html = str_get_html('<div id="hello">World</div>');
var_dump($html->find("div", 0)->attr); /
/*
array(1) {
  ["id"]=>
  string(5) "hello"
}
*/

foreach ( $html->find("div", 0)->attr as &$value ){
    $value = null;
}

print $html
//<div>World</div>

Run Code Online (Sandbox Code Playgroud)

归档时间：	15 年，8 月前
查看次数：	65469 次
最近记录：	7 年，9 月前