如何在PHP中解析HTML?

lar*_*dev 7 html php parsing dom

我知道我们可以使用PHP DOM来使用PHP解析HTML.我在Stack Overflow上发现了很多问题.但我有一个特定的要求.我有一个像下面这样的HTML内容

<p class="Heading1-P">
    <span class="Heading1-H">Chapter 1</span>
</p>
<p class="Normal-P">
    <span class="Normal-H">This is chapter 1</span>
</p>
<p class="Heading1-P">
    <span class="Heading1-H">Chapter 2</span>
</p>
<p class="Normal-P">
    <span class="Normal-H">This is chapter 2</span>
</p>
<p class="Heading1-P">
    <span class="Heading1-H">Chapter 3</span>
</p>
<p class="Normal-P">
    <span class="Normal-H">This is chapter 3</span>
</p>
Run Code Online (Sandbox Code Playgroud)

我想解析上面的HTML并将内容保存到两个不同的数组中,如:

$heading$content

$heading = array('Chapter 1','Chapter 2','Chapter 3');
$content = array('This is chapter 1','This is chapter 2','This is chapter 3');
Run Code Online (Sandbox Code Playgroud)

我可以简单地使用jQuery实现这一点.但我不确定,如果这是正确的方式.如果有人能指出我正确的方向,那就太好了.提前致谢.

saj*_*i89 15

我使用domdocument和domxpath来获得解决方案,你可以在以下位置找到它:

<?php
$dom = new DomDocument();
$test='<p class="Heading1-P">
    <span class="Heading1-H">Chapter 1</span>
</p>
<p class="Normal-P">
    <span class="Normal-H">This is chapter 1</span>
</p>
<p class="Heading1-P">
    <span class="Heading1-H">Chapter 2</span>
</p>
<p class="Normal-P">
    <span class="Normal-H">This is chapter 2</span>
</p>
<p class="Heading1-P">
    <span class="Heading1-H">Chapter 3</span>
</p>
<p class="Normal-P">
    <span class="Normal-H">This is chapter 3</span>
</p>';

$dom->loadHTML($test);
$xpath = new DOMXpath($dom);
    $heading=parseToArray($xpath,'Heading1-H');
    $content=parseToArray($xpath,'Normal-H');

var_dump($heading);
echo "<br/>";
var_dump($content);
echo "<br/>";

function parseToArray($xpath,$class)
{
    $xpathquery="//span[@class='".$class."']";
    $elements = $xpath->query($xpathquery);

    if (!is_null($elements)) {  
        $resultarray=array();
        foreach ($elements as $element) {
            $nodes = $element->childNodes;
            foreach ($nodes as $node) {
              $resultarray[] = $node->nodeValue;
            }
        }
        return $resultarray;
    }
}
Run Code Online (Sandbox Code Playgroud)

实时结果: http ://saji89.codepad.org/2TyOAibZ


Pau*_*ich 13

试着看看PHP Simple HTML DOM Parser

它具有类似于jQuery的出色语法,因此您可以轻松地按ID或类选择任何您想要的元素

// include/require the simple html dom parser file

$html_string = '
    <p class="Heading1-P">
        <span class="Heading1-H">Chapter 1</span>
    </p>
    <p class="Normal-P">
        <span class="Normal-H">This is chapter 1</span>
    </p>
    <p class="Heading1-P">
        <span class="Heading1-H">Chapter 2</span>
    </p>
    <p class="Normal-P">
        <span class="Normal-H">This is chapter 2</span>
    </p>
    <p class="Heading1-P">
        <span class="Heading1-H">Chapter 3</span>
    </p>
    <p class="Normal-P">
        <span class="Normal-H">This is chapter 3</span>
    </p>';
$html = str_get_html($html_string);
foreach($html->find('span') as $element) {
    if ($element->class === 'Heading1-H') {
        $heading[] = $element->innertext;
    }else if($element->class === 'Normal-H') {
        $content[] = $element->innertext;
    }
}
Run Code Online (Sandbox Code Playgroud)

  • !!注意!!不使用“-&gt;innertext”会导致内存泄漏。 (3认同)
  • 与使用 DomDocument 相比,这是一个更简单的选项,并且生成更具可读性的代码。 (2认同)

8ct*_*pus 9

这是解析 html 的另一种方法DiDOM,它在速度和内存占用方面提供了显着更好的性能。

composer require imangazaliev/didom
Run Code Online (Sandbox Code Playgroud)
<?php

use DiDom\Document;

require_once('vendor/autoload.php');

$html = <<<HTML
<p class="Heading1-P">
    <span class="Heading1-H">Chapter 1</span>
</p>
<p class="Normal-P">
    <span class="Normal-H">This is chapter 1</span>
</p>
<p class="Heading1-P">
    <span class="Heading1-H">Chapter 2</span>
</p>
<p class="Normal-P">
    <span class="Normal-H">This is chapter 2</span>
</p>
<p class="Heading1-P">
    <span class="Heading1-H">Chapter 3</span>
</p>
<p class="Normal-P">
    <span class="Normal-H">This is chapter 3</span>
</p>
HTML;

$document = new Document($html);

// find chapter headings
$elements = $document->find('.Heading1-H');

$headings = [];

foreach ($elements as $element) {
    $headings[] = $element->text();
}

// find chapter texts
$elements = $document->find('.Normal-H');

$chapters = [];

foreach ($elements as $element) {
    $chapters[] = $element->text();
}

echo("Headings\n");

foreach ($headings as $heading) {
    echo("- {$heading}\n");
}

echo("Chapter texts\n");

foreach ($chapters as $chapter) {
    echo("- {$chapter}\n");
}
Run Code Online (Sandbox Code Playgroud)

  • 当你发现一个关于 SO 的老问题有一个非常好的现代答案时,你会喜欢它。那个 DOM 解析器非常棒,干杯。 (3认同)

Gre*_*eso 5

您的一种选择是使用 DOMDocument 和 DOMXPath。它们确实需要一些曲线来学习,但是一旦你这样做了,你就会对你所取得的成就感到非常满意。

阅读 php.net 中的以下内容

http://php.net/manual/en/class.domdocument.php

http://php.net/manual/en/class.domxpath.php

希望这可以帮助。

  • 不。使用。php。统治。这个答案是旧的。PHP Dom 与 2020+ HTML 不兼容 (3认同)