Rad*_*Hex 45 python proxy http urllib2
我很熟悉我应该将HTTP_RPOXY环境变量设置为代理地址.
一般urllib工作正常,问题是处理urllib2.
>>> urllib2.urlopen("http://www.google.com").read()
Run Code Online (Sandbox Code Playgroud)
回报
urllib2.URLError: <urlopen error [Errno 10061] No connection could be made because the target machine actively refused it>
Run Code Online (Sandbox Code Playgroud)
要么
urllib2.URLError: <urlopen error [Errno 11004] getaddrinfo failed>
Run Code Online (Sandbox Code Playgroud)
我试过@Fenikso的答案,但我现在收到这个错误:
URLError: <urlopen error [Errno 10060] A connection attempt failed because the
connected party did not properly respond after a period of time, or established
connection failed because connected host has failed to respond>
Run Code Online (Sandbox Code Playgroud)
有任何想法吗?
Fen*_*kso 61
即使没有HTTP_PROXY环境变量,您也可以这样做.试试这个样本:
import urllib2
proxy_support = urllib2.ProxyHandler({"http":"http://61.233.25.166:80"})
opener = urllib2.build_opener(proxy_support)
urllib2.install_opener(opener)
html = urllib2.urlopen("http://www.google.com").read()
print html
Run Code Online (Sandbox Code Playgroud)
在您的情况下,似乎代理服务器似乎拒绝连接.
还有更多尝试:
import urllib2
#proxy = "61.233.25.166:80"
proxy = "YOUR_PROXY_GOES_HERE"
proxies = {"http":"http://%s" % proxy}
url = "http://www.google.com/search?q=test"
headers={'User-agent' : 'Mozilla/5.0'}
proxy_support = urllib2.ProxyHandler(proxies)
opener = urllib2.build_opener(proxy_support, urllib2.HTTPHandler(debuglevel=1))
urllib2.install_opener(opener)
req = urllib2.Request(url, None, headers)
html = urllib2.urlopen(req).read()
print html
Run Code Online (Sandbox Code Playgroud)
编辑2014:
这似乎是一个受欢迎的问题/答案.但是今天我会改用第三方requests
模块.
对于一个请求,只需:
import requests
r = requests.get("http://www.google.com",
proxies={"http": "http://61.233.25.166:80"})
print(r.text)
Run Code Online (Sandbox Code Playgroud)
对于多个请求,请使用Session
对象,这样您就不必proxies
在所有请求中添加参数:
import requests
s = requests.Session()
s.proxies = {"http": "http://61.233.25.166:80"}
r = s.get("http://www.google.com")
print(r.text)
Run Code Online (Sandbox Code Playgroud)
小智 16
我建议你只使用请求模块.
它比内置的http客户端容易得多:http: //docs.python-requests.org/en/latest/index.html
样品用法:
r = requests.get('http://www.thepage.com', proxies={"http":"http://myproxy:3129"})
thedata = r.content
Run Code Online (Sandbox Code Playgroud)
只是想提一下,如果https_proxy
需要访问https URL ,您可能还需要设置OS环境变量.在我的情况下,这对我来说并不明显,我试了几个小时才发现这个.
我的用例:Win 7,jython-standalone-2.5.3.jar,通过ez_setup.py安装setuptools