Big*_*337 29 python python-requests
这是脚本:
import requests
import json
import urlparse
from requests.adapters import HTTPAdapter
s = requests.Session()
s.mount('http://', HTTPAdapter(max_retries=1))
with open('proxies.txt') as proxies:
for line in proxies:
proxy=json.loads(line)
with open('urls.txt') as urls:
for line in urls:
url=line.rstrip()
data=requests.get(url, proxies=proxy)
data1=data.content
print data1
print {'http': line}
Run Code Online (Sandbox Code Playgroud)
正如您所看到的,它试图通过代理列表访问URL列表.这是urls.txt文件:
http://api.exip.org/?call=ip
Run Code Online (Sandbox Code Playgroud)
这是proxies.txt文件:
{"http":"http://107.17.92.18:8080"}
Run Code Online (Sandbox Code Playgroud)
我在www.hidemyass.com上获得了此代理.它可能是一个糟糕的代理吗?我尝试了几个,这就是结果.注意:如果您尝试复制此内容,则可能必须将代理更新为hidemyass.com上的最新代理.他们似乎最终停止工作.这是完整的错误和追溯:
Traceback (most recent call last):
File "test.py", line 17, in <module>
data=requests.get(url, proxies=proxy)
File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 55, in get
return request('get', url, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 44, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 335, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 454, in send
history = [resp for resp in gen] if allow_redirects else []
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 144, in resolve_redirects
allow_redirects=False,
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 438, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/adapters.py", line 327, in send
raise ConnectionError(e)
requests.exceptions.ConnectionError: HTTPConnectionPool(host=u'219.231.143.96', port=18186): Max retries exceeded with url: http://www.google.com/ (Caused by <class 'httplib.BadStatusLine'>: '')
Run Code Online (Sandbox Code Playgroud)
Eug*_*Loy 33
查看堆栈跟踪,您提供的错误是由httplib.BadStatusLine
异常引起的,根据文档,它是:
如果服务器使用我们不理解的HTTP状态代码进行响应,则引发此异常.
换句话说,代理服务器返回的内容(如果完全返回)无法由执行实际请求的httplib解析.
根据我对(编写)http代理的经验,我可以说某些实现可能不会严格遵循规范(实际上http上的rfc规范并不容易阅读)或者使用hacks来修复在其实现中存在缺陷的旧浏览器.
所以,回答这个问题:
它可能是一个糟糕的代理吗?
......我会说 - 这是可能的.唯一真正的方法是查看代理服务器返回的内容.
尝试使用调试器调试它或抓取数据包嗅探器(类似Wireshark或网络监视器)对其进行以分析网络中发生的情况.获取有关代理服务器返回的确切内容的信息应该为您提供解决此问题的密钥.
小智 8
也许你在短时间内通过发送太多请求来超载代理服务器,你说你从一个流行的免费代理网站获得了代理,这意味着你不是唯一一个使用该服务器的人,而且它通常很重加载.
如果您在请求之间添加一些延迟,如下所示:
from time import sleep
[...]
data=requests.get(url, proxies=proxy)
data1=data.content
print data1
print {'http': line}
sleep(1)
Run Code Online (Sandbox Code Playgroud)
(注意sleep(1)
暂停执行代码一秒钟)
它有用吗?
归档时间: |
|
查看次数: |
80146 次 |
最近记录: |