使用PHP DOM DOCUMENT获取整个BODY内容

Nit*_*kar 8 php

我想使用DOM Document获取整个body标签内容.

我使用以下代码:

$dom = new domDocument;

/*** load the html into the object ***/
$dom->loadHTML($html);

/*** the table by its tag name ***/
$tables = $dom->getElementsByTagName('body')->item(0)->nodeValue;
Run Code Online (Sandbox Code Playgroud)

这给了我TExt.我想要全身内容.

Vol*_*erK 12

您可以将正文DOMElement传递给DOMDocument :: saveHTML()DOMDocument :: saveHTMLFile(),例如

<?php
$doc = new DOMDocument;
$doc->loadhtmlfile('http://stackoverflow.com');

$body = $doc->getElementsByTagName('body');
if ( $body && 0<$body->length ) {
    $body = $body->item(0);
    echo $doc->savehtml($body);
}
Run Code Online (Sandbox Code Playgroud)

版画

Warning: DOMDocument::loadHTMLFile(): Unexpected end tag : p in http://stackoverflow.com, line: 2843 [...]
<body class="home-page">
<noscript><div id="noscript-padding"></div></noscript>
<div id="notify-container"></div>
<div id="overlay-header"></div>
<div id="custom-header"></div>
<div class="container">
        <div id="header">
            <div id="portalLink">
[...]
Run Code Online (Sandbox Code Playgroud)

  • 不过,您仍然会得到“外部”&lt;body&gt;&lt;/body&gt; 标签。您需要使用正则表达式删除它们,或者在此处查看另一个解决方案:/sf/answers/787842751/ (2认同)

Spo*_*oky 5

$dom = new domDocument;
$dom->loadHTML($html);

// ... change, replace ...
// ... mock, traverse ..

$body = $dom->documentElement->lastChild;
$dom->saveHTML($body);
Run Code Online (Sandbox Code Playgroud)