get_headers不一致

Bab*_*aba 4 php validation

运行以下代码

var_dump(get_headers("http://www.domainnnnnnnnnnnnnnnnnnnnnnnnnnnn.com/CraxyFile.jpg"));
Run Code Online (Sandbox Code Playgroud)

返回HTTP 200而不是404对于任何不存在的域或URL

Array
(
    [0] => HTTP/1.1 200 OK
    [1] => Server: nginx/1.1.15
    [2] => Date: Mon, 08 Oct 2012 12:29:13 GMT
    [3] => Content-Type: text/html; charset=utf-8
    [4] => Connection: close
    [5] => Set-Cookie: PHPSESSID=3iucojet7bt2peub72rgo0iu21; path=/; HttpOnly
    [6] => Expires: Thu, 19 Nov 1981 08:52:00 GMT
    [7] => Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
    [8] => Pragma: no-cache
    [9] => Set-Cookie: bypassStaticCache=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; path=/; httponly
    [10] => Set-Cookie: bypassStaticCache=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; path=/; httponly
    [11] => Vary: Accept
)
Run Code Online (Sandbox Code Playgroud)

如果你跑

var_dump(get_headers("http://www.domain.com/CraxyFile.jpg"));
Run Code Online (Sandbox Code Playgroud)

你得到

Array
(
    [0] => HTTP/1.1 404 Not Found
    [1] => Date: Mon, 08 Oct 2012 12:32:18 GMT
    [2] => Content-Type: text/html
    [3] => Content-Length: 8727
    [4] => Connection: close
    [5] => Server: Apache
    [6] => Vary: Accept-Encoding
)
Run Code Online (Sandbox Code Playgroud)

它们是如此多的实例,get_headers已被证明是验证现有URL的解决方案

这是一个Bug或get_headers不是验证URL的可靠方法

观看现场演示

更新1

要弄清楚CURL也有同样的问题

$curl = curl_init();
curl_setopt_array($curl, array(CURLOPT_RETURNTRANSFER => true,CURLOPT_URL => 'idontexist.tld'));
curl_exec($curl);
$info = curl_getinfo($curl);
curl_close($curl);
var_dump($info);
Run Code Online (Sandbox Code Playgroud)

也返回相同的结果

Dav*_*dom 11

问题与域名的长度无关,只是域名是否存在.

您正在使用DNS服务将不存在的域解析为服务器,该服务器为您提供"友好"错误页面,并返回200响应代码.这意味着它也不是一个get_headers()特别的问题,它是任何基础上依赖于合理的DNS查找的过程.

一种处理这种方法的方法可能看起来像这样:在不为你工作的每个环境硬编码工作的情况下:

// A domain that definitely does not exist. The easiest way to guarantee that
// this continues to work is to use an illegal top-level domain (TLD) suffix
$testDomain = 'idontexist.tld';

// If this resolves to an IP, we know that we are behind a service such as this
// We can simply compare the actual domain we test with the result of this
$badIP = gethostbyname($testDomain);

// Then when you want to get_headers()
$url = 'http://www.domainnnnnnnnnnnnnnnnnnnnnnnnnnnn.com/CraxyFile.jpg';

$host = parse_url($url, PHP_URL_HOST);
if (gethostbyname($host) === $badIP) {
  // The domain does not exist - probably handle this as if it were a 404
} else {
  // do the actual get_headers() stuff here
}
Run Code Online (Sandbox Code Playgroud)

您可能希望以某种方式缓存第一次调用的返回值gethostbyname(),因为您知道您正在查找不存在的名称,这通常需要几秒钟.

  • @Baba具体来说,它不是`get_headers()`,它实际上是基于名称而不是IP地址在网络上执行任务的任何函数.但简而言之,不是它不可靠 - 还有其他原因,因为它依赖于服务器处理`HEAD`请求的方式与处理`GET`请求的方式相同,这在很多方面都不是一个安全的假设(即使它应该按照标准). (3认同)