Facebook无法抓住我的网址

Nin*_*nja 8 facebook codeigniter opengraph facebook-opengraph

我有我的页面的HTML结构,如下所示.我添加了所有元标记,但仍然无法从我的网站上抓取任何信息.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"  xmlns:fb="http://www.facebook.com/2008/fbml">
    <head>
            <meta http-equiv="Content-Type" content="text/html;" charset=utf-8"></meta>
            <title>My Site</title>
            <meta content="This is my title" property="og:title">
            <meta content="This is my description" property="og:description">
            <meta content="http://ia.media-imdb.com/images/rock.jpg" property="og:image">
            <meta content="<MYPAGEID>" property="fb:page_id">
            .......
    </head>
    <body>
    .....
Run Code Online (Sandbox Code Playgroud)

当我在facebook调试器(https://developers.facebook.com/tools/debug)中输入URL时,我收到以下消息:

Scrape Information
Response Code   404

Critical Errors That Must Be Fixed
Bad Response Code   URL returned a bad HTTP response code.


Errors that must be fixed

Missing Required Property   The 'og:url' property is required, but not present.
Missing Required Property   The 'og:type' property is required, but not present.
Missing Required Property   The 'og:title' property is required, but not present.


Open Graph Warnings That Should Be Fixed
Inferred Property   The 'og:url' property should be explicitly provided, even if a    value can be inferred from other tags.
Inferred Property   The 'og:title' property should be explicitly provided, even if a value can be inferred from other tags.
Run Code Online (Sandbox Code Playgroud)

为什么facebook没有阅读元标记信息?该页面是可访问的,不会隐藏在登录后面等.

UPDATE

好的,我做了一些调试,这就是我找到的.我在我的目录中设置了htaccess规则 - 我正在使用PHP Codeigniter框架并且有htaccess规则从url中删除index.php.

因此,当我将网址提供给没有index.php的facebook调试器(https://developers.facebook.com/tools/debug)时,facebook会显示404,但是当我使用index.php提供网址时,它能够解析我的页面.

现在,当url没有index.php时,如何让facebook抓取内容?

这是我的htaccess规则:

<IfModule mod_rewrite.c>
    RewriteEngine On
    RewriteBase /

    #Removes access to the system folder by users.
    #Additionally this will allow you to create a System.php controller,
    #previously this would not have been possible.
    #'system' can be replaced if you have renamed your system folder.
    RewriteCond %{REQUEST_URI} ^system.*
    RewriteRule ^(.*)$ /index.php?/$1 [L]

    #When your application folder isn't in the system folder
    #This snippet prevents user access to the application folder
    #Submitted by: Fabdrol
    #Rename 'application' to your applications folder name.
    RewriteCond %{REQUEST_URI} ^application.*
    RewriteRule ^(.*)$ /index.php?/$1 [L]

    #Checks to see if the user is attempting to access a valid file,
    #such as an image or css document, if this isn't true it sends the
    #request to index.php
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteRule ^(.*)$ index.php?/$1 [L]
</IfModule>

<IfModule !mod_rewrite.c>
    # If we don't have mod_rewrite installed, all 404's
    # can be sent to index.php, and everything works as normal.
    # Submitted by: ElliotHaughin

    ErrorDocument 404 /index.php
</IfModule>
Run Code Online (Sandbox Code Playgroud)

Lix*_*Lix 9

Facebook文档包含有关开放图谱协议的详细信息以及如何包含正确的元标记,以便Facebook可以准确地抓取您的URL.

https://developers.facebook.com/docs/opengraphprotocol/

基本上你要做的是og:tags在你现有的元标记中包含一些特殊的(或者另外的).

  <head>
    <title>Ninja Site</title>
    <meta property="og:title" content="The Ninja"/>
    <meta property="og:type" content="movie"/>
    <meta property="og:url" content="http://www.nin.ja"/>
    <meta property="og:image" content="http://nin.ja/ninja.jpg"/>
    <meta property="og:site_name" content="Ninja"/>
    <meta property="fb:admins" content="USER_ID"/>
    <meta property="og:description"
          content="Superhuman or supernatural powers were often
                   associated with the ninja. Some legends include
                   flight, invisibility and shapeshifting..."/>
    ...
  </head>
Run Code Online (Sandbox Code Playgroud)

如果你有一个.htaccess文件重定向东西并且让Facebook难以刮掉你的URL,那么你可以通过检测Facebook的爬虫来获取.htaccess正确的标签.我相信Facebook抓取工具提供的用户代理是这样的:

facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)
Run Code Online (Sandbox Code Playgroud)

该文档还有一节讨论确保他们的抓取工具可以访问您的网站.

根据您的配置,您可以通过查看服务器access_log来测试这一点.在运行apache的UNIX系统上,访问日志位于/var/log/httpd/access_log.

所以你可以在.htaccess文件中使用类似的条目-

RewriteCond %{HTTP_USER_AGENT} ^facebookexternalhit
RewriteRule ^(.*)$ ogtags.php?$1 [L,QSA]
Run Code Online (Sandbox Code Playgroud)

[L,QSA]我在那里放置的标志声明这是将在当前request()和(查询字符串附加)上强制执行的L ast规则,指出在重写URL时将传递给定的任何查询字符串.例如,一个URL,例如:LQSA

https://example.com/?id=foo&action=bar
Run Code Online (Sandbox Code Playgroud)

将传递给ogtags.php这样 - ogtags.php?id=foo&action=bar.您的ogtags.php文件将根据传递的参数生成动态og:meta标记.

现在,只要您的.htaccess文件检测到Facebook用户代理,它就会传递给他ogtags.php文件(可以包含正确的og:元信息).请注意您在其中的任何其他规则.htaccess以及它们可能如何影响新规则.

.htaccess您详细介绍的条目中,我建议将这个新的"Facebook规则"作为第一条规则.