我已经看到了这个问题,但它并不能满足我的需求.该问题的答案要么是:从元描述标签中提升,第二个是为您已经拥有主体的文章生成摘录.
我想要做的实际上是得到的前几句文章,像可读性一样.这不是最好的方法吗?HTML解析?这是我目前使用的,但这不是很可靠.
function guessExcerpt($url) {
$html = file_get_contents_curl($url);
$doc = new DOMDocument();
@$doc->loadHTML($html);
$metas = $doc->getElementsByTagName('meta');
for ($i = 0; $i < $metas->length; $i++)
{
$meta = $metas->item($i);
if($meta->getAttribute('name') == 'description')
$description = $meta->getAttribute('content');
}
return $description;
}
function file_get_contents_curl($url) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_TIMEOUT, 5);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
Run Code Online (Sandbox Code Playgroud)
这是PHP中的可读性端口:https://github.com/feelinglucky/php-readability.就试一试吧.提取结果类似于Readability(因为它实现了Readability的算法).
require 'lib/Readability.inc.php';
$html = file_get_contents_curl($url);
$Readability = new Readability($html, $html_input_charset); // default charset is utf-8
$ReadabilityData = $Readability->getContent();
$title = $ReadabilityData['title'];
$content = $ReadabilityData['content'];
Run Code Online (Sandbox Code Playgroud)
然后你可以使用一些句子$content作为摘录.
| 归档时间: |
|
| 查看次数: |
3915 次 |
| 最近记录: |