为什么cURL返回一个空字符串?

Nic*_*ick 16 php curl domdocument

我有一个问题,PHP的cURL返回一个带有一些URL的空字符串.我正在尝试解析不同网页的OG元数据,它适用于我尝试过的除NYTimes之外的所有网站.到目前为止,这是我的代码.

print_r(get_og_metadata('http://somewebsite.com'));


public function get_data($url)
{
    $ch = curl_init();
    $timeout = 5;
    // the url to fetch
    curl_setopt($ch, CURLOPT_URL, $url);
    // return result as a string rather than direct output
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    // set max time of cURL execution
    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
    $data = curl_exec($ch);
    curl_close($ch);
    return $data;
}

public function get_og_metadata($url)
{
    libxml_use_internal_errors(TRUE);
    $data = $this->_get_data($url);
    $doc = new DOMDocument();
    $doc->loadHTML($data);

    $xpath = new DOMXPath($doc);
    $query = '//*/meta[starts-with(@property, \'og:\')]';

    $metadatas = $xpath->query($query);
    $result = array();
    foreach($metadatas as $metadata)
    {
        $property = $metadata->getAttribute('property');
        $content = $metadata->getAttribute('content');
        $result[$property] = $content;
    }

    return $result;
}
Run Code Online (Sandbox Code Playgroud)

Abh*_*oel 27

这五行为我带来了魔力.

   curl_setopt($ch, CURLOPT_USERAGENT,'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.52 Safari/537.17');
   curl_setopt($ch, CURLOPT_AUTOREFERER, true); 
   curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
   curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
   curl_setopt($ch, CURLOPT_VERBOSE, 1);
Run Code Online (Sandbox Code Playgroud)

  • `CURLOPT_FOLLOWLOCATION` 和 `CURLOPT_USERAGENT` 救了我。谢谢 (2认同)

Zir*_*ode 15

我的猜测是像纽约时代这样的网站可以防范这种行为.很可能这是基于用户代理,您可以伪造如下:

curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.52 Safari/537.17');
Run Code Online (Sandbox Code Playgroud)

这是最常见的代理商btw.

  • curl_setopt($ ch,CURLOPT_FOLLOWLOCATION,1); 为我工作 (10认同)
  • 设置用户代理不起作用,但实际上将auto_referrer设置为TRUE.你的回答确实帮助我重新思考可能导致问题的原因!curl_setopt($ ch,CURLOPT_AUTOREFERER,true); (4认同)

Mic*_*son 6

(那个答案也是我)

这就是为我做的事情.它正在寻找SSL验证,在这个特定情况下我碰巧不需要它.

curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, FALSE);
Run Code Online (Sandbox Code Playgroud)