如何使用dom php解析器

chr*_*ris 13 php dom html-parsing

我是PHP的DOM解析新手:
我有一个我试图解析的HTML文件.它有一堆像这样的DIV:

<div id="interestingbox"> 
   <div id="interestingdetails" class="txtnormal">
        <div>Content1</div>
        <div>Content2</div>
   </div>
</div>

<div id="interestingbox"> 
......
Run Code Online (Sandbox Code Playgroud)

我正在尝试使用php获取许多div框的内容.如何使用DOM解析器执行此操作?

谢谢!

ape*_*ari 20

首先我必须告诉你,你不能在两个不同的div上使用相同的id; 有关于这一点的课程.每个元素都应该有唯一的id.

使用id ="interestingbox"获取div内容的代码

$html = '
<html>
<head></head>
<body>
<div id="interestingbox"> 
   <div id="interestingdetails" class="txtnormal">
        <div>Content1</div>
        <div>Content2</div>
   </div>
</div>

<div id="interestingbox2"><a href="#">a link</a></div>
</body>
</html>';


$dom_document = new DOMDocument();

$dom_document->loadHTML($html);

//use DOMXpath to navigate the html with the DOM
$dom_xpath = new DOMXpath($dom_document);

// if you want to get the div with id=interestingbox
$elements = $dom_xpath->query("*/div[@id='interestingbox']");

if (!is_null($elements)) {

  foreach ($elements as $element) {
    echo "\n[". $element->nodeName. "]";

    $nodes = $element->childNodes;
    foreach ($nodes as $node) {
      echo $node->nodeValue. "\n";
    }

  }
}

//OUTPUT
[div]  {
        Content1
        Content2
}
Run Code Online (Sandbox Code Playgroud)

类的示例:

$html = '
<html>
<head></head>
<body>
<div class="interestingbox"> 
   <div id="interestingdetails" class="txtnormal">
        <div>Content1</div>
        <div>Content2</div>
   </div>
</div>

<div class="interestingbox"><a href="#">a link</a></div>
</body>
</html>';

//the same as before.. just change the xpath

[...]

$elements = $dom_xpath->query("*/div[@class='interestingbox']");

[...]

//OUTPUT
[div]  {
        Content1
        Content2
}

[div]  {
a link
}
Run Code Online (Sandbox Code Playgroud)

有关更多详细信息,请参阅DOMXPath页面.


chr*_*ris 6

我使用simplehtmldom作为开始使用:

$html = file_get_html('example.com');
foreach ($html->find('div[id=interestingbox]') as $result)
{
    echo $result->innertext;
}
Run Code Online (Sandbox Code Playgroud)