我有这个HTML代码:
<p style="padding:0px;">
<strong style="padding:0;margin:0;">hello</strong>
</p>
Run Code Online (Sandbox Code Playgroud)
但它应该成为(对于所有可能的html标签):
<p>
<strong>hello</strong>
</p>
Run Code Online (Sandbox Code Playgroud)
gna*_*arf 146
$text = '<p style="padding:0px;"><strong style="padding:0;margin:0;">hello</strong></p>';
echo preg_replace("/<([a-z][a-z0-9]*)[^>]*?(\/?)>/i",'<$1$2>', $text);
// <p><strong>hello</strong></p>
Run Code Online (Sandbox Code Playgroud)
RegExp细分:
/ # Start Pattern
< # Match '<' at beginning of tags
( # Start Capture Group $1 - Tag Name
[a-z] # Match 'a' through 'z'
[a-z0-9]* # Match 'a' through 'z' or '0' through '9' zero or more times
) # End Capture Group
[^>]*? # Match anything other than '>', Zero or More times, not-greedy (wont eat the /)
(\/?) # Capture Group $2 - '/' if it is there
> # Match '>'
/i # End Pattern - Case Insensitive
Run Code Online (Sandbox Code Playgroud)
添加一些引用,并使用替换文本,<$1$2>它应该删除标记名后面的任何文本,直到标记结束/>或只是>.
请注意这不一定适用于所有输入,因为Anti-HTML + RegExp会告诉您.有几回退,最值得注意的是<p style=">">最终将<p>">和其他一些破碎的问题......我会建议看Zend_Filter_StripTags在PHP作为更充分证明标签/属性过滤器
Gor*_*don 66
以下是如何使用本机DOM执行此操作:
$dom = new DOMDocument; // init new DOMDocument
$dom->loadHTML($html); // load HTML into it
$xpath = new DOMXPath($dom); // create a new XPath
$nodes = $xpath->query('//*[@style]'); // Find elements with a style attribute
foreach ($nodes as $node) { // Iterate over found elements
$node->removeAttribute('style'); // Remove style attribute
}
echo $dom->saveHTML(); // output cleaned HTML
Run Code Online (Sandbox Code Playgroud)
如果要从所有可能的标记中删除所有可能的属性,请执行
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query('//@*');
foreach ($nodes as $node) {
$node->parentNode->removeAttribute($node->nodeName);
}
echo $dom->saveHTML();
Run Code Online (Sandbox Code Playgroud)
我会避免使用正则表达式,因为HTML不是常规语言,而是使用像Simple HTML DOM这样的html解析器
您可以使用该对象获取的属性列表attr.例如:
$html = str_get_html('<div id="hello">World</div>');
var_dump($html->find("div", 0)->attr); /
/*
array(1) {
["id"]=>
string(5) "hello"
}
*/
foreach ( $html->find("div", 0)->attr as &$value ){
$value = null;
}
print $html
//<div>World</div>
Run Code Online (Sandbox Code Playgroud)