我一直在尝试解析HTML5代码,所以我可以在代码中设置属性/值,但似乎DOMDocument(PHP5.3)不支持像<nav>和的标签<section>.
有没有办法在PHP中解析这个HTML并操纵代码?
代码重现:
<?php
$dom = new DOMDocument();
$dom->loadHTML("<!DOCTYPE HTML>
<html><head><title>test</title></head>
<body>
<nav>
<ul>
<li>first
<li>second
</ul>
</nav>
<section>
...
</section>
</body>
</html>");
Run Code Online (Sandbox Code Playgroud)
错误
警告:DOMDocument :: loadHTML():实体中的标签导航无效,第17行/home/wbkrnl/public_html/new-mvc/1.php中的第4行:
警告:DOMDocument :: loadHTML():实体中的标记部分无效,第17行/home/wbkrnl/public_html/new-mvc/1.php中的第10行:
我想解析链接:http://dizli.com/dizli/db.html使用php.
但是当我写代码时,
$url = "http://dizli.com/dizli/db.html";
$dom = new DOMDocument();
$html = $dom->loadHTMLFile($url);
$dom->preserveWhiteSpace = false;
$tables = $dom->getElementsByTagName('table');
$tr = $tables->item(2)->getElementsByTagName('tr');
$rows = $tables->item(0)->getElementsByTagName('td');
foreach($rows as $row)
{
$movie = $row->getElementsByTagName('b');
echo $movie;}
Run Code Online (Sandbox Code Playgroud)
我收到了一堆错误:
Warning: DOMDocument::loadHTMLFile() [domdocument.loadhtmlfile]: Opening and ending tag mismatch: font and td in http://dizli.com/dizli/db.html, line: 54 in C:\development\app_server\C7\Lib\Tools\News.php on line 93
Warning: DOMDocument::loadHTMLFile() [domdocument.loadhtmlfile]: Opening and ending tag mismatch: font and b in http://dizli.com/dizli/db.html, line: 81 in C:\development\app_server\C7\Lib\Tools\News.php on line 93
Warning: DOMDocument::loadHTMLFile() …Run Code Online (Sandbox Code Playgroud) 我想把字符串"hinson lou ann"排除在外:
<div class='owner-name'>hinson lou ann</div>
Run Code Online (Sandbox Code Playgroud)
当我运行以下内容时:
$html = "http://gisapps.co.union.nc.us/ws/rest/v2/cm_iw.ashx?gid=12339";
$doc = new DOMDocument();
$doc->loadHTMLFile($html);
$xpath = new DOMXpath($doc);
$elements = $xpath->query("*/div[@class='owner-name']");
if (!is_null($elements)) {
foreach ($elements as $element) {
echo "<br/>[" . $element->nodeName . "]";
$nodes = $element->childNodes;
foreach ($nodes as $node) {
echo $node->nodeValue . "\n";
}
}
}
Run Code Online (Sandbox Code Playgroud)
我得到一个错误:
警告:DOMDocument :: loadHTMLFile()[domdocument.loadhtmlfile]:htmlParseEntityRef:http://gisapps.co.union.nc.us/ws/rest/v2/cm_iw.ashx?gid = 12339 ,line:1中没有名称在/ home ...在线......
哪个指的是行loadHTMLFILE.
注意:该文件无效HTML只包含div标签!我加载文件然后在其body上打了HTML 标签是什么?
我有这个不起作用的基本代码。如何在 html5lib php 中使用 Xpath?或以任何其他方式使用 HTML5 的 Xpath。
$url = 'http://en.wikipedia.org/wiki/PHP';
$response = GuzzleHttp\get($url);
$html5 = new Masterminds\HTML5();
$dom = $html5->loadHTML($response);
$xpath = new DOMXPath($dom);
$elements = $xpath->query('//h1');
//$elements = $dom->getElementsByTagName('h1');
foreach ($elements as $element)
{
var_dump($element);
}
Run Code Online (Sandbox Code Playgroud)
未找到任何元素。使用$xpath->query('.')作品来获取根元素(通常 xpath 似乎有效)。$dom->getElementsByTagName('h1')正在工作。
我搜索了无数的页面,试图找到真正有效的答案。我已经尝试过库文件来专门处理警告和错误处理,但即使我抑制所有警告和错误,这最后一个警告仍然显示:
Warning: DOMDocument::loadHTML(): Empty string supplied as input
Run Code Online (Sandbox Code Playgroud)
我的php处理如下。只要用户输入实际的 url,该代码就可以完美运行,但是当用户输入的数据不是 url 时,就会显示上面的警告。
if (isset($_GET[article_url])){
$title = 'contact us';
$str = @file_get_contents($_GET[article_url]);
$test1 = str_word_count(strip_tags(strtolower($str)));
if($test1 === FALSE) { $test = '0'; }
if ($test1 > '550') {
echo '<div><i class="fa fa-check-square-o" style="color:green"></i> This article has '.$test1.' words.';
} else {
echo '<div><i class="fa fa-times-circle-o" style="color:red"></i> This article has '.$test1.' words. You are required to have a minimum of 500 words.</div>';
}
$document = new DOMDocument();
$libxml_previous_state = libxml_use_internal_errors(true);
$document->loadHTML($str);
libxml_use_internal_errors($libxml_previous_state); …Run Code Online (Sandbox Code Playgroud)