sar*_*ath 2 html php file-get-contents
我想从这个网页上获得链接" http://www.w3schools.com/default.asp "和" http://www.google.com ".我想要<a>里面的标签链接<div class="link">,还有很多其他<a>标签在这个页面,我不想要他们.我怎样才能检索特定的链接?谁能帮我?
<div class="link">
<a href="http://www.w3schools.com/default.asp">
<h4>W3 Schools</h4>
</a>
</div>
<div class="link">
<a href="http://www.google.com">
<h4>Google</h4>
</a>
</div>
Run Code Online (Sandbox Code Playgroud)
使用DOM解析器(如DOMDocument)来实现此目的:
$dom = new DOMDocument;
$dom->loadHTML($html); // $html is a string containing the HTML
foreach ($dom->getElementsByTagName('a') as $link) {
echo $link->getAttribute('href').'<br/>';
}
Run Code Online (Sandbox Code Playgroud)
输出:
http://www.w3schools.com/default.asp
http://www.google.com
Run Code Online (Sandbox Code Playgroud)
更新:如果您只想要特定内部的链接<div>,您可以使用XPath表达式查找div中的链接,然后循环它们以获取href属性:
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$links_inside_div = $xpath->query("//*[contains(@class, 'link')]/a");
foreach ($links_inside_div as $link) {
echo $link->getAttribute('href').'<br/>';
}
Run Code Online (Sandbox Code Playgroud)