我目前正在使用urllib2在python2.7中编写一个简单的爬虫.这是下载器类.
class Downloader:
def __init__(self, limit = 3):
self.limit = limit
def downloadGet(self, url):
request = urllib2.Request(url)
retry = 0
succ = False
page = None
while retry < self.limit:
print "Retry: " + str(retry) + " Limit:" + str(self.limit)
try:
response = urllib2.urlopen(request)
page = response.read()
succ = True
break
except:
retry += 1
return succ, page
Run Code Online (Sandbox Code Playgroud)
每个网址都会被尝试三次.还使用多线程,线程代码如下:
class DownloadThread(Thread):
def __init__(self, requestGet, limit):
Thread.__init__(self)
self.requestGet = requestGet
self.downloader = Downloader(limit)
def run(self):
while True:
url = self.requestGet()
if …Run Code Online (Sandbox Code Playgroud)