代理检查python

Mar*_*coW 16 python proxy http

我在python中编写了一个使用cookie和POST/GET的脚本.我还在脚本中包含了代理支持.但是,当一个人进入死代理代理时,脚本崩溃.在运行我的其余脚本之前,有没有办法检查代理是否死/活?

此外,我注意到一些代理不能正确处理cookie/POST头.有没有什么办法解决这一问题?

dbr*_*dbr 17

最简单的方法是从urllib中捕获IOError异常:

try:
    urllib.urlopen(
        "http://example.com",
        proxies={'http':'http://example.com:8080'}
    )
except IOError:
    print "Connection error! (Check proxy)"
else:
    print "All was fine"
Run Code Online (Sandbox Code Playgroud)

此外,从这篇博客文章 - "检查状态代理地址"(略有改进):

对于python 2

import urllib2
import socket

def is_bad_proxy(pip):    
    try:
        proxy_handler = urllib2.ProxyHandler({'http': pip})
        opener = urllib2.build_opener(proxy_handler)
        opener.addheaders = [('User-agent', 'Mozilla/5.0')]
        urllib2.install_opener(opener)
        req=urllib2.Request('http://www.example.com')  # change the URL to test here
        sock=urllib2.urlopen(req)
    except urllib2.HTTPError, e:
        print 'Error code: ', e.code
        return e.code
    except Exception, detail:
        print "ERROR:", detail
        return True
    return False

def main():
    socket.setdefaulttimeout(120)

    # two sample proxy IPs
    proxyList = ['125.76.226.9:80', '213.55.87.162:6588']

    for currentProxy in proxyList:
        if is_bad_proxy(currentProxy):
            print "Bad Proxy %s" % (currentProxy)
        else:
            print "%s is working" % (currentProxy)

if __name__ == '__main__':
    main()
Run Code Online (Sandbox Code Playgroud)

对于python 3

import urllib.request
import socket
import urllib.error

def is_bad_proxy(pip):    
    try:
        proxy_handler = urllib.request.ProxyHandler({'http': pip})
        opener = urllib.request.build_opener(proxy_handler)
        opener.addheaders = [('User-agent', 'Mozilla/5.0')]
        urllib.request.install_opener(opener)
        req=urllib.request.Request('http://www.example.com')  # change the URL to test here
        sock=urllib.request.urlopen(req)
    except urllib.error.HTTPError as e:
        print('Error code: ', e.code)
        return e.code
    except Exception as detail:
        print("ERROR:", detail)
        return True
    return False

def main():
    socket.setdefaulttimeout(120)

    # two sample proxy IPs
    proxyList = ['125.76.226.9:80', '25.176.126.9:80']

    for currentProxy in proxyList:
        if is_bad_proxy(currentProxy):
            print("Bad Proxy %s" % (currentProxy))
        else:
            print("%s is working" % (currentProxy))

if __name__ == '__main__':
    main() 
Run Code Online (Sandbox Code Playgroud)

请记住,这可能会使脚本占用的时间增加一倍,如果代理已关闭(因为您将不得不等待两个连接超时).除非您特别需要知道代理有问题,否则处理IOError会更简洁,更简单更快..

  • 但是有些代理可以连接到该 url,但它们不会从该 url 打开实际的 html,它们会显示自定义错误,因此您无法在那里捕获异常,最好在 req.read 中检查字符串()? (3认同)