Mac*_*Mac 52 php curl title meta-tags
我想弄清楚如何获得
<title>A common title</title>
<meta name="keywords" content="Keywords blabla" />
<meta name="description" content="This is the description" />
Run Code Online (Sandbox Code Playgroud)
即使它以任何顺序排列,我也听说过PHP Simple HTML DOM Parser,但我真的不想使用它.除了使用PHP Simple HTML DOM Parser之外,是否可以使用解决方案.
preg_match 如果HTML无效,将无法执行此操作?
cURL可以用preg_match做这样的事吗?
Facebook做了类似这样的事情,但它通过使用正确使用:
<meta property="og:description" content="Description blabla" />
Run Code Online (Sandbox Code Playgroud)
我想要这样的东西,以便当有人发布链接时,它应该检索标题和元标记.如果没有元标记,那么它会被忽略或者用户可以自己设置(但我稍后会自己做).
sha*_*mar 155
这是应该的方式:
function file_get_contents_curl($url)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
$html = file_get_contents_curl("http://example.com/");
//parsing begins here:
$doc = new DOMDocument();
@$doc->loadHTML($html);
$nodes = $doc->getElementsByTagName('title');
//get and display what you need:
$title = $nodes->item(0)->nodeValue;
$metas = $doc->getElementsByTagName('meta');
for ($i = 0; $i < $metas->length; $i++)
{
$meta = $metas->item($i);
if($meta->getAttribute('name') == 'description')
$description = $meta->getAttribute('content');
if($meta->getAttribute('name') == 'keywords')
$keywords = $meta->getAttribute('content');
}
echo "Title: $title". '<br/><br/>';
echo "Description: $description". '<br/><br/>';
echo "Keywords: $keywords";
Run Code Online (Sandbox Code Playgroud)
Bob*_*eey 35
<?php
// Assuming the above tags are at www.example.com
$tags = get_meta_tags('http://www.example.com/');
// Notice how the keys are all lowercase now, and
// how . was replaced by _ in the key.
echo $tags['author']; // name
echo $tags['keywords']; // php documentation
echo $tags['description']; // a php manual
echo $tags['geo_position']; // 49.33;-86.59
?>
Run Code Online (Sandbox Code Playgroud)
get_meta_tags将帮助你除了标题之外的所有人.要获得标题只需使用正则表达式.
$url = 'http://some.url.com';
preg_match("/<title>(.+)<\/title>/siU", file_get_contents($url), $matches);
$title = $matches[1];
Run Code Online (Sandbox Code Playgroud)
希望有所帮助.
小智 7
get_meta_tags 没有与标题一起工作。
只有具有名称属性的元标记,例如
<meta name="description" content="the description">
Run Code Online (Sandbox Code Playgroud)
将被解析。
不幸的是,内置的 php 函数 get_meta_tags() 需要 name 参数,而某些站点(例如 twitter)为了支持 property 属性而将其保留。此函数混合使用正则表达式和 dom 文档,将从网页中返回元标记的键控数组。它检查名称参数,然后检查属性参数。这已经在 instragram、pinterest 和 twitter 上进行了测试。
/**
* Extract metatags from a webpage
*/
function extract_tags_from_url($url) {
$tags = array();
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
$contents = curl_exec($ch);
curl_close($ch);
if (empty($contents)) {
return $tags;
}
if (preg_match_all('/<meta([^>]+)content="([^>]+)>/', $contents, $matches)) {
$doc = new DOMDocument();
$doc->loadHTML('<?xml encoding="utf-8" ?>' . implode($matches[0]));
$tags = array();
foreach($doc->getElementsByTagName('meta') as $metaTag) {
if($metaTag->getAttribute('name') != "") {
$tags[$metaTag->getAttribute('name')] = $metaTag->getAttribute('content');
}
elseif ($metaTag->getAttribute('property') != "") {
$tags[$metaTag->getAttribute('property')] = $metaTag->getAttribute('content');
}
}
}
return $tags;
}
Run Code Online (Sandbox Code Playgroud)
我们不应该使用OG吗?
选择的答案很好,但在网站重定向时不起作用(非常常见!),并且不返回OG 标签,这是新的行业标准。这是一个在 2018 年更有用的小功能。它尝试获取 OG 标签,如果无法获取,则回退到元标签:
function getSiteOG( $url, $specificTags=0 ){
$doc = new DOMDocument();
@$doc->loadHTML(file_get_contents($url));
$res['title'] = $doc->getElementsByTagName('title')->item(0)->nodeValue;
foreach ($doc->getElementsByTagName('meta') as $m){
$tag = $m->getAttribute('name') ?: $m->getAttribute('property');
if(in_array($tag,['description','keywords']) || strpos($tag,'og:')===0) $res[str_replace('og:','',$tag)] = $m->getAttribute('content');
}
return $specificTags? array_intersect_key( $res, array_flip($specificTags) ) : $res;
}
Run Code Online (Sandbox Code Playgroud)
如何使用它:
/////////////
//SAMPLE USAGE:
print_r(getSiteOG("http://www.stackoverflow.com")); //note the incorrect url
/////////////
//OUTPUT:
Array
(
[title] => Stack Overflow - Where Developers Learn, Share, & Build Careers
[description] => Stack Overflow is the largest, most trusted online community for developers to learn, shareâ âtheir programming âknowledge, and build their careers.
[type] => website
[url] => https://stackoverflow.com/
[site_name] => Stack Overflow
[image] => https://cdn.sstatic.net/Sites/stackoverflow/img/apple-touch-icon@2.png?v=73d79a89bded
)
Run Code Online (Sandbox Code Playgroud)
一个简单的函数来了解如何检索 og:tags、标题和描述,请自行调整
function read_og_tags_as_json($url){
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
$HTML_DOCUMENT = curl_exec($ch);
curl_close($ch);
$doc = new DOMDocument();
$doc->loadHTML($HTML_DOCUMENT);
// fecth <title>
$res['title'] = $doc->getElementsByTagName('title')->item(0)->nodeValue;
// fetch og:tags
foreach( $doc->getElementsByTagName('meta') as $m ){
// if had property
if( $m->getAttribute('property') ){
$prop = $m->getAttribute('property');
// here search only og:tags
if( preg_match("/og:/i", $prop) ){
// get results on an array -> nice for templating
$res['og_tags'][] =
array( 'property' => $m->getAttribute('property'),
'content' => $m->getAttribute('content') );
}
}
// end if had property
// fetch <meta name="description" ... >
if( $m->getAttribute('name') == 'description' ){
$res['description'] = $m->getAttribute('content');
}
}
// end foreach
// render JSON
echo json_encode($res, JSON_PRETTY_PRINT |
JSON_UNESCAPED_UNICODE | JSON_UNESCAPED_SLASHES);
}
Run Code Online (Sandbox Code Playgroud)
返回此页面(可能有更多信息):
{
"title": "php - Getting title and meta tags from external website - Stack Overflow",
"og_tags": [
{
"property": "og:type",
"content": "website"
},
{
"property": "og:url",
"content": "/sf/ask/259795021/"
},
{
"property": "og:site_name",
"content": "Stack Overflow"
},
{
"property": "og:image",
"content": "https://cdn.sstatic.net/Sites/stackoverflow/Img/apple-touch-icon@2.png?v=73d79a89bded"
},
{
"property": "og:title",
"content": "Getting title and meta tags from external website"
},
{
"property": "og:description",
"content": "I want to try figure out how to get the\n\n<title>A common title</title>\n<meta name=\"keywords\" content=\"Keywords blabla\" />\n<meta name=\"description\" content=\"This is the descript..."
}
]
}
Run Code Online (Sandbox Code Playgroud)