urllib2.urlopen()vs urllib.urlopen() - urllib2在urllib工作时抛出404!为什么?

16 python url urllib urllib2 http-status-code-404

import urllib

print urllib.urlopen('http://www.reefgeek.com/equipment/Controllers_&_Monitors/Neptune_Systems_AquaController/Apex_Controller_&_Accessories/').read()
Run Code Online (Sandbox Code Playgroud)

上面的脚本工作并返回预期的结果,同时:

import urllib2

print urllib2.urlopen('http://www.reefgeek.com/equipment/Controllers_&_Monitors/Neptune_Systems_AquaController/Apex_Controller_&_Accessories/').read()
Run Code Online (Sandbox Code Playgroud)

抛出以下错误:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.5/urllib2.py", line 124, in urlopen
    return _opener.open(url, data)
  File "/usr/lib/python2.5/urllib2.py", line 387, in open
    response = meth(req, response)
  File "/usr/lib/python2.5/urllib2.py", line 498, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python2.5/urllib2.py", line 425, in error
    return self._call_chain(*args)
  File "/usr/lib/python2.5/urllib2.py", line 360, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.5/urllib2.py", line 506, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 404: Not Found
Run Code Online (Sandbox Code Playgroud)

有人知道为什么吗?我在我的家庭网络上的笔记本电脑上运行这个,没有代理设置 - 只是直接从我的笔记本电脑到路由器再到www.

Jon*_*erg 35

该URL确实产生了404,但有很多HTML内容.urllib2正在处理它(正确)作为错误条件.您可以像这样恢复该网站的404页面的内容:

import urllib2
try:
    print urllib2.urlopen('http://www.reefgeek.com/equipment/Controllers_&_Monitors/Neptune_Systems_AquaController/Apex_Controller_&_Accessories/').read()
except urllib2.HTTPError, e:
    print e.code
    print e.msg
    print e.headers
    print e.fp.read()
Run Code Online (Sandbox Code Playgroud)

  • 这很有用 - 出于好奇,当我在浏览器中输入这个URL时,它也有效.这是否意味着浏览器也在接收404,但只显示urllib这样的内容? (2认同)