python urllib2无法获取特定的url

ehs*_*adi 1 python urllib2 httprequest

我正在使用urllib2来请求URL并读取它们的内容,但遗憾的是它不能用于某些URL.看看这些命令:

#No problem with this URL
urllib2.urlopen('http://www.huffingtonpost.com/2014/07/19/todd-akin-slavery_n_5602083.html')
#This one produced error
urllib2.urlopen('http://www.foxnews.com/us/2014/07/19/cartels-suspected-as-high-caliber-gunfire-sends-border-patrol-scrambling-on-rio/')
Run Code Online (Sandbox Code Playgroud)

产生的第二个URL和错误如下:

Traceback (most recent call last):
  File "D:/Developer Center/Republishan/republishan2/republishan2/test.py", line 306, in <module>
    urllib2.urlopen('http://www.foxnews.com/us/2014/07/19/cartels-suspected-as-high-caliber-gunfire-sends-border-patrol-scrambling-on-rio/')
  File "C:\Python27\lib\urllib2.py", line 127, in urlopen
    return _opener.open(url, data, timeout)
  File "C:\Python27\lib\urllib2.py", line 410, in open
    response = meth(req, response)
  File "C:\Python27\lib\urllib2.py", line 523, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Python27\lib\urllib2.py", line 448, in error
    return self._call_chain(*args)
  File "C:\Python27\lib\urllib2.py", line 382, in _call_chain
    result = func(*args)
  File "C:\Python27\lib\urllib2.py", line 531, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 404: Not Found
Run Code Online (Sandbox Code Playgroud)

这有什么问题?

Wal*_*lly 6

我认为该网站正在检查User-Agent默认情况下urllib未设置的和/或其他标头.

您可以手动设置用户代理.

请求库自动设置用户代理.

但请记住,某些站点也可能阻止了用户代理请求.

试试这个.这对我有用.您需要先安装请求模块!

pip install requests
Run Code Online (Sandbox Code Playgroud)

然后

import requests

r = requests.get("http://www.foxnews.com/us/2014/07/19/cartels-suspected-as-high-caliber-gunfire-sends-border-patrol-scrambling-on-rio/")

print r.text
Run Code Online (Sandbox Code Playgroud)

Urllib很难,你需要编写更多代码.请求更简单,更符合Python的理念,即代码应该很漂亮!