PHP DomDocument - getElementByID(部分匹配)如何?

Sol*_*son 3 php getelementbyid domdocument

有没有办法获取 id 部分匹配的所有元素。例如,如果我想获取网页上的所有 HTML 元素,其 id 属性以 开头,msg_但可以是此后的任何内容。

\n\n

这是我到目前为止所做的:

\n\n
$doc = new DomDocument;\n\n// We need to validate our document before refering to the id\n$doc->validateOnParse = true;\n$doc->loadHtml(file_get_contents(\'{URL IS HERE}\'));\nforeach($doc->getElementById(\'msg_\') as $element) { \n   foreach($element->getElementsByTagName(\'a\') as $link)\n   {\n      echo $link->nodeValue . "\\n";\n   }\n}\n
Run Code Online (Sandbox Code Playgroud)\n\n

但我需要弄清楚如何使用此位进行部分 id 匹配: $doc->getElementById(\'msg_\')或者是否有其他方法可以完成此操作...?

\n\n

基本上,我需要抓取所有 'a' 标签,这些标签是 id 开头的元素的子元素。从msg_ 技术上讲,总是只有 1 个a标签,但我不知道如何抓取第一个孩子,这就是为什么我也使用 foreach 的原因。

\n\n

DomDocument PHP 类可以实现这一点吗?

\n\n

这是我现在使用的代码,它也不起作用:

\n\n
$str = \'\';\n$filename = \'http://dream-portal.net/index.php/board,65.0.html\';\n@set_time_limit(0);\n\n$fp = fopen($filename, \'rb\');\nwhile (!feof($fp))\n{\n    $str .= fgets($fp, 16384);\n}\nfclose($fp);\n\n$doc = new DOMDocument();\n$doc->loadXML($str);\n\n$selector = new DOMXPath($doc);\n\n$elements = $selector->query(\'//row[starts-with(@id, "msg_")]\');\n\nforeach ($elements as $node) {\n    var_dump($node->nodeValue) . PHP_EOL;\n}\n
Run Code Online (Sandbox Code Playgroud)\n\n

HTML如下(在标签中span):

\n\n
<td class="subject windowbg2">\n<div>\n  <span id="msg_6555">\n    <a href="http://dream-portal.net/index.php?topic=834.0">Poll 1.0</a>\n  </span>\n  <p>\n    Started by \n    <a href="http://dream-portal.net/index.php?action=profile;u=1" title="View the profile of SoLoGHoST">SoLoGHoST</a>\n    <small id="pages6555">\n      \xc2\xab \n      <a class="navPages" href="http://dream-portal.net/index.php?topic=834.0">1</a>\n      <a class="navPages" href="http://dream-portal.net/index.php?topic=834.15">2</a>\n        \xc2\xbb\n    </small>\n\n                        with 963 Views\n\n  </p>\n</div>\n</td>\n
Run Code Online (Sandbox Code Playgroud)\n\n

这是<span id="msg_部分,并且有很多这样的部分(HTML 页面上至少有 15 个)。

\n

hek*_*mgl 5

用这个:

$str = file_get_contents('http://dream-portal.net/index.php/board,65.0.html');

$doc = new DOMDocument();
@$doc->loadHTML($str);

$selector = new DOMXPath($doc);

foreach ($selector->query('//*[starts-with(@id, "msg_")]') as $node) {
    var_dump($node->nodeValue) . PHP_EOL;
}
Run Code Online (Sandbox Code Playgroud)

给你:

string(8) "Poll 1.0"
string(12) "Shoutbox 2.2"
string(24) "Polaroid Attachments 1.6"
string(24) "Featured News Slider 1.3"
string(17) "Image Resizer 1.0"
string(8) "Blog 2.2"
string(13) "RSS Feeds 1.0"
string(19) "Adspace Manager 1.2"
string(21) "Facebook Like Box 1.0"
string(15) "Price Table 1.0"
string(13) "SMF Links 1.0"
string(19) "Download System 1.2"
string(16) "[*]Site News 1.0"
string(12) "Calendar 1.3"
string(16) "Page Peel Ad 1.1"
string(20) "Sexy Bookmarks 1.0.1"
string(15) "Forum Staff 1.2"
string(21) "Facebook Comments 1.0"
string(15) "Attachments 1.4"
string(25) "YouTube Channels 0.9 Beta"
Run Code Online (Sandbox Code Playgroud)