Python urllib2返回一个空字符串

ip.*_*ip. 2 python urllib2

我正在尝试检索以下网址:http://www.winkworth.co.uk/sale/property/flat-for-sale-in-masefield-court-london-n5/HIH140004.

import urllib2
response = urllib2.urlopen('http://www.winkworth.co.uk/rent/property/terraced-house-to-rent-in-mill-road--/WOT140129')
response.read()
Run Code Online (Sandbox Code Playgroud)

但是我得到一个空字符串.当我通过浏览器或使用cURL进行尝试时,它可以正常工作.有什么想法发生了什么?

Mar*_*ers 11

我在使用requests库时得到了响应,但在使用时却没有urllib2,所以我尝试了HTTP请求标头.

事实证明,服务器需要一个Accept标题; urllib2不发送一个,requests并且cURL发送*/*.

发送一个urllib2也是:

url = 'http://www.winkworth.co.uk/sale/property/flat-for-sale-in-masefield-court-london-n5/HIH140004'
req = urllib2.Request(url, headers={'accept': '*/*'})
response = urllib2.urlopen(req)
Run Code Online (Sandbox Code Playgroud)

演示:

>>> import urllib2
>>> url = 'http://www.winkworth.co.uk/sale/property/flat-for-sale-in-masefield-court-london-n5/HIH140004'
>>> len(urllib2.urlopen(url).read())
0
>>> request = urllib2.Request(url, headers={'accept': '*/*'})
>>> len(urllib2.urlopen(request).read())
37197
Run Code Online (Sandbox Code Playgroud)

服务器在这里有问题; RFC 2616规定:

如果不存在Accept头字段,则假定客户端接受所有媒体类型.