Bel*_*ial 19 python urllib python-3.x
我正在尝试打开并解析一个html页面.在python 2.7.8中我没有问题:
import urllib
url = "https://ipdb.at/ip/66.196.116.112"
html = urllib.urlopen(url).read()
Run Code Online (Sandbox Code Playgroud)
一切都很好.但是我想转移到python 3.4并在那里得到HTTP错误403(禁止).我的代码:
import urllib.request
html = urllib.request.urlopen(url) # same URL as before
File "C:\Python34\lib\urllib\request.py", line 153, in urlopen
return opener.open(url, data, timeout)
File "C:\Python34\lib\urllib\request.py", line 461, in open
response = meth(req, response)
File "C:\Python34\lib\urllib\request.py", line 574, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python34\lib\urllib\request.py", line 499, in error
return self._call_chain(*args)
File "C:\Python34\lib\urllib\request.py", line 433, in _call_chain
result = func(*args)
File "C:\Python34\lib\urllib\request.py", line 582, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
Run Code Online (Sandbox Code Playgroud)
它适用于不使用https的其他URL.
url = 'http://www.stopforumspam.com/ipcheck/212.91.188.166'
Run Code Online (Sandbox Code Playgroud)
没关系.
fal*_*tru 32
看起来该网站不喜欢Python 3.x的用户代理.
指定User-Agent将解决您的问题:
import urllib.request
req = urllib.request.Request(url, headers={'User-Agent': 'Mozilla/5.0'})
html = urllib.request.urlopen(req).read()
Run Code Online (Sandbox Code Playgroud)
注意 Python 2.x urllib版本也会收到403状态,但与Python 2.x urllib2和Python 3.x urllib不同,它不会引发异常.
您可以通过以下代码确认:
print(urllib.urlopen(url).getcode()) # => 403
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
20346 次 |
| 最近记录: |