在尝试通过MS Windows上httr::user_agent的httr::GET()呼叫更改用户代理时,我需要考虑一些特殊问题吗?我正在使用R-3.1.0和httr 0.3.
按照示例?user_agent,我得到了这些结果:
url_this <- "http://httpbin.org/user-agent"
Run Code Online (Sandbox Code Playgroud)
标准用户代理:
GET(url_this)
Response [http://httpbin.org/user-agent]
Status: 200
Content-type: application/json
{
"user-agent": "curl/7.19.6 Rcurl/1.95.4.1 httr/0.3"
}
Run Code Online (Sandbox Code Playgroud)
修改的用户代理:
GET(url_this, user_agent("Mozilla/5.0"))
Response [http://httpbin.org/user-agent]
Status: 200
Content-type: application/json
{
"user-agent": "curl/7.19.6 Rcurl/1.95.4.1 httr/0.3"
}
Run Code Online (Sandbox Code Playgroud)
我曾预料到第二次调用会返回更接近我url_this在浏览器中访问时所获得的内容:
{
"user-agent": "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:29.0) Gecko/20100101 Firefox/29.0"
}
Run Code Online (Sandbox Code Playgroud)
我在这里错过了什么?也先跑setInternet2(TRUE),但结果相同.
非常好奇的帮助页面?user_agent表明它应该工作.您可以显式设置标题,它确实有效
> GET("http://httpbin.org/user-agent", add_headers("user-agent" = "Mozilla/5.0"))
Response [http://httpbin.org/user-agent]
Status: 200
Content-type: application/json
{
"user-agent": "Mozilla/5.0"
}
Run Code Online (Sandbox Code Playgroud)
但是给出的例子?user_agent似乎没有.
> GET("http://httpbin.org/user-agent", user_agent("Mozilla/5.0") )
Response [http://httpbin.org/user-agent]
Status: 200
Content-type: application/json
{
"user-agent": "curl/7.19.6 Rcurl/1.95.4.1 httr/0.3"
}
>
Run Code Online (Sandbox Code Playgroud)
它正在回归
> httr:::default_ua()
[1] "curl/7.19.7 Rcurl/1.95.4.1 httr/0.3"
Run Code Online (Sandbox Code Playgroud)
我的ISP也做了一些时髦的事情,所以你可能需要:
GET("http://httpbin.org/user-agent", add_headers("user-agent" = "Mozilla/5.0", "Cache-Control" = "no-cache"))
Run Code Online (Sandbox Code Playgroud)