从html源代码中删除评论

Lui*_*uis 11 php curl

我知道如何通过cUrl获取html源代码,但我想删除html文档上的注释(我的意思是介于两者之间<!-- .. -->).另外,如果我可以只BODY使用html文档.谢谢.

Yos*_*shi 27

试试PHP DOM*:

$html = '<html><body><!--a comment--><div>some content</div></body></html>'; // put your cURL result here

$dom = new DOMDocument;
$dom->loadHtml($html);

$xpath = new DOMXPath($dom);
foreach ($xpath->query('//comment()') as $comment) {
    $comment->parentNode->removeChild($comment);
}

$body = $xpath->query('//body')->item(0);
$newHtml = $body instanceof DOMNode ? $dom->saveXml($body) : 'something failed';

var_dump($newHtml);
Run Code Online (Sandbox Code Playgroud)

输出:

string(36) "<body><div>some content</div></body>"
Run Code Online (Sandbox Code Playgroud)