解析URL的网站

Question

解析URL的网站

Bil*_*son 5 html php parsing html-parsing

只是想知道是否有人可以通过以下方式进一步帮助我.我想解析这个网站上的URL:http://www.directorycritic.com/free-directory-list.html？pg = 1&sort = pr

我有以下代码:

<?PHP  
$url = "http://www.directorycritic.com/free-directory-list.html?pg=1&sort=pr";
$input = @file_get_contents($url) or die("Could not access file: $url"); 
$regexp = "<a\s[^>]*href=(\"??)([^\" >]*?)\\1[^>]*>(.*)<\/a>"; 
if(preg_match_all("/$regexp/siU", $input, $matches)) { 
// $matches[2] = array of link addresses 
// $matches[3] = array of link text - including HTML code
} 
?>

Run Code Online (Sandbox Code Playgroud)

目前没有做什么,我需要做的是废弃所有16页的表中的所有URL,并且非常感谢一些帮助,如何修改上述内容并将URL输出到文本文件中.

Answer 1

Nav*_*eed 5

使用HTML Dom Parser

$html = file_get_html('http://www.example.com/');

// Find all links
$links = array(); 
foreach($html->find('a') as $element) 
       $links[] = $element->href;

Run Code Online (Sandbox Code Playgroud)

现在,链接数组包含给定页面的所有URL,您可以使用这些URL进一步解析.

使用正则表达式解析HTML不是一个好主意.以下是一些相关帖子:

编辑:

Gordon在下面的评论中描述了一些其他HTML解析工具:

归档时间：	14 年，8 月前
查看次数：	6452 次
最近记录：	14 年，3 月前