Fab*_*Fab 4 wget http php curl
要检查一组 URL 的 HTTP 响应标头,我使用 curl 发送以下请求标头
foreach ( $urls as $url )
{
// Setup headers - I used the same headers from Firefox version 2.0.0.6
$header[ ] = "Accept: text/xml,application/xml,application/xhtml+xml,";
$header[ ] = "text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
$header[ ] = "Cache-Control: max-age=0";
$header[ ] = "Connection: keep-alive";
$header[ ] = "Keep-Alive: 300";
$header[ ] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
$header[ ] = "Accept-Language: en-us,en;q=0.5";
$header[ ] = "Pragma: "; // browsers keep this blank.
curl_setopt( $ch, CURLOPT_URL, $url );
curl_setopt( $ch, CURLOPT_USERAGENT, 'Googlebot/2.1 (+http://www.google.com/bot.html)');
curl_setopt( $ch, CURLOPT_HTTPHEADER, $header);
curl_setopt( $ch, CURLOPT_REFERER, 'http://www.google.com');
curl_setopt( $ch, CURLOPT_HEADER, true );
curl_setopt( $ch, CURLOPT_NOBODY, true );
curl_setopt( $ch, CURLOPT_RETURNTRANSFER, true );
curl_setopt( $ch, CURLOPT_FOLLOWLOCATION, true );
curl_setopt( $ch, CURLOPT_HTTPAUTH, CURLAUTH_ANY );
curl_setopt( $ch, CURLOPT_TIMEOUT, 10 ); //timeout 10 seconds
}
Run Code Online (Sandbox Code Playgroud)
有时我会收到 200 OK 这在其他时候很好 301, 302, 307 我认为也很好,但有时我收到奇怪的状态 406, 500, 504 这应该标识一个无效的 url 但是当我在浏览器上打开它时它们没事
例如脚本返回
http://www.awe.co.uk/ => HTTP/1.1 406 Not Acceptable
Run Code Online (Sandbox Code Playgroud)
和 wget 返回
wget http://www.awe.co.uk/
--2011-06-23 15:26:26-- http://www.awe.co.uk/
Resolving www.awe.co.uk... 77.73.123.140
Connecting to www.awe.co.uk|77.73.123.140|:80... connected.
HTTP request sent, awaiting response... 200 OK
Run Code Online (Sandbox Code Playgroud)
有谁知道我缺少或添加了哪些请求头?
您在请求中包含无效的 HTTP 标头:
$header[ ] = "Accept: text/xml,application/xml,application/xhtml+xml,";
$header[ ] = "text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
Run Code Online (Sandbox Code Playgroud)
在第一行,列表以一个,- 即空内容类型 - 结束,这是 406不可接受错误的原因。第二行甚至不是 HTTP 标头。
如果您正在使用数据包嗅探器查看 Firefox HTTP 对话,您可能会看到如下内容:
Accept: text/xml,application/xml,application/xhtml+xml,
text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Run Code Online (Sandbox Code Playgroud)
由于第二行以空格开头,因此服务器将它们视为单个标头。它们还必须作为一个标头传递给 curl:
$header[] = "Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
Run Code Online (Sandbox Code Playgroud)
您可以使用http://echo.opera.com来比较正在发送的请求。
| 归档时间: |
|
| 查看次数: |
4728 次 |
| 最近记录: |