如何从php解析的html中删除除允许列表之外的所有标记

Question

如何从php解析的html中删除除允许列表之外的所有标记

Fin*_*ish 1 php dom html-parsing domdocument php-parser

我在php中解析html,因为我无法控制原始内容,我想删除样式和不必要的标签,同时仍保留内容和标签的简短列表,即:

p,img,iframe(也许还有其他一些)

我知道我可以删除一个给定的标签(参见我在下面使用的代码),但由于我不一定知道它们可能是什么标签,而且我不想创建一个巨大的可能列表,我会喜欢除了我允许的列表之外能够删除所有内容.

function DOMRemove(DOMNode $from) {
    $sibling = $from->firstChild;

    do {
        $next = $sibling->nextSibling;
        $from->parentNode->insertBefore($sibling, $from);
    } while ($sibling = $next);

    $from->parentNode->removeChild($from);
}

$dom = new DOMDocument;
$dom->loadHTML($html);

$nodes = $dom->getElementsByTagName('span');

Run Code Online (Sandbox Code Playgroud)

Answer 1

Koa*_*ung 5

正如上面的cpattersonv1所说,你可以简单地使用strip_tags()来完成工作.

<?php

// strip all other tags except mentioned (p, img, iframe)
$html_result = strip_tags($html, '<p><img><iframe>');

?>

Run Code Online (Sandbox Code Playgroud)

归档时间：	12 年，8 月前
查看次数：	2522 次
最近记录：	12 年，5 月前