Mat*_*ttH 21
看起来名称解析最终由处理socket.create_connection.
-> urllib2.urlopen
-> httplib.HTTPConnection
-> socket.create_connection
Run Code Online (Sandbox Code Playgroud)
虽然设置了"Host:"标头后,您可以解析主机并将IP地址传递到开启器.
我建议你进行子类化httplib.HTTPConnection,并在传递之前将connect方法包装起来进行修改.self.hostsocket.create_connection
然后子类HTTPHandler(和HTTPSHandler)将http_open方法替换为传递你的方法HTTPConnection而不是httplib自己的方法do_open.
像这样:
import urllib2
import httplib
import socket
def MyResolver(host):
if host == 'news.bbc.co.uk':
return '66.102.9.104' # Google IP
else:
return host
class MyHTTPConnection(httplib.HTTPConnection):
def connect(self):
self.sock = socket.create_connection((MyResolver(self.host),self.port),self.timeout)
class MyHTTPSConnection(httplib.HTTPSConnection):
def connect(self):
sock = socket.create_connection((MyResolver(self.host), self.port), self.timeout)
self.sock = ssl.wrap_socket(sock, self.key_file, self.cert_file)
class MyHTTPHandler(urllib2.HTTPHandler):
def http_open(self,req):
return self.do_open(MyHTTPConnection,req)
class MyHTTPSHandler(urllib2.HTTPSHandler):
def https_open(self,req):
return self.do_open(MyHTTPSConnection,req)
opener = urllib2.build_opener(MyHTTPHandler,MyHTTPSHandler)
urllib2.install_opener(opener)
f = urllib2.urlopen('http://news.bbc.co.uk')
data = f.read()
from lxml import etree
doc = etree.HTML(data)
>>> print doc.xpath('//title/text()')
['Google']
Run Code Online (Sandbox Code Playgroud)
如果您使用HTTPS,显然存在证书问题,您需要填写MyResolver ...
Tah*_*gir 17
另一种(脏)方式是猴子修补socket.getaddrinfo.
例如,此代码为dns查找添加(无限制)缓存.
import socket
prv_getaddrinfo = socket.getaddrinfo
dns_cache = {} # or a weakref.WeakValueDictionary()
def new_getaddrinfo(*args):
try:
return dns_cache[args]
except KeyError:
res = prv_getaddrinfo(*args)
dns_cache[args] = res
return res
socket.getaddrinfo = new_getaddrinfo
Run Code Online (Sandbox Code Playgroud)