use*_*355 7 php xpath domdocument
我在下面的html代码中有一个广告列表.我需要的是一个PHP循环来获取每个广告的以下元素:
<a>标记的href属性)<img>标记的src属性)<div class="title">标签的html内容)<div class="ads">
<a href="http://path/to/ad/1">
<div class="ad">
<div class="image">
<div class="wrapper">
<img src="http://path/to/ad/1/image.jpg">
</div>
</div>
<div class="detail">
<div class="title">Ad #1</div>
</div>
</div>
</a>
<a href="http://path/to/ad/2">
<div class="ad">
<div class="image">
<div class="wrapper">
<img src="http://path/to/ad/2/image.jpg">
</div>
</div>
<div class="detail">
<div class="title">Ad #2</div>
</div>
</div>
</a>
</div>
Run Code Online (Sandbox Code Playgroud)
我设法使用下面的PHP代码获取广告网址.
$d = new DOMDocument();
$d->loadHTML($ads); // the variable $ads contains the HTML code above
$xpath = new DOMXPath($d);
$ls_ads = $xpath->query('//a');
foreach ($ls_ads as $ad) {
$ad_url = $ad->getAttribute('href');
print("AD URL : $ad_url");
}
Run Code Online (Sandbox Code Playgroud)
但我没有设法获得其他2个元素(图片网址和标题).任何的想法?
use*_*355 11
我设法得到了我需要的代码(基于Khue Vu的代码):
$d = new DOMDocument();
$d->loadHTML($ads); // the variable $ads contains the HTML code above
$xpath = new DOMXPath($d);
$ls_ads = $xpath->query('//a');
foreach ($ls_ads as $ad) {
// get ad url
$ad_url = $ad->getAttribute('href');
// set current ad object as new DOMDocument object so we can parse it
$ad_Doc = new DOMDocument();
$cloned = $ad->cloneNode(TRUE);
$ad_Doc->appendChild($ad_Doc->importNode($cloned, True));
$xpath = new DOMXPath($ad_Doc);
// get ad title
$ad_title_tag = $xpath->query("//div[@class='title']");
$ad_title = trim($ad_title_tag->item(0)->nodeValue);
// get ad image
$ad_image_tag = $xpath->query("//img/@src");
$ad_image = $ad_image_tag->item(0)->nodeValue;
}
Run Code Online (Sandbox Code Playgroud)
Khu*_* Vu 10
对于其他元素,你只需要做同样的事情:
foreach ($ls_ads as $ad) {
$ad_url = $ad->getAttribute('href');
print("AD URL : $ad_url");
$ad_Doc = new DOMDocument();
$ad_Doc->documentElement->appendChild($ad_Doc->importNode($ad));
$xpath = new DOMXPath($ad_Doc);
$img_src = $xpath->query("//img[@src]");
$title = $xpath->query("//div[@class='title']");
}
Run Code Online (Sandbox Code Playgroud)