use*_*022 2 python loops while-loop
我目前正在使用urllib2在python2.7中编写一个简单的爬虫.这是下载器类.
class Downloader:
def __init__(self, limit = 3):
self.limit = limit
def downloadGet(self, url):
request = urllib2.Request(url)
retry = 0
succ = False
page = None
while retry < self.limit:
print "Retry: " + str(retry) + " Limit:" + str(self.limit)
try:
response = urllib2.urlopen(request)
page = response.read()
succ = True
break
except:
retry += 1
return succ, page
Run Code Online (Sandbox Code Playgroud)
每个网址都会被尝试三次.还使用多线程,线程代码如下:
class DownloadThread(Thread):
def __init__(self, requestGet, limit):
Thread.__init__(self)
self.requestGet = requestGet
self.downloader = Downloader(limit)
def run(self):
while True:
url = self.requestGet()
if url == None:
break
ret = self.download(url)
print ret
def download(self, url):
# some other staff
succ, flv = self.downloader.downloadGet(url)
return succ
Run Code Online (Sandbox Code Playgroud)
但是,在实验中,线程数设置为5,下载器在尝试3次后不会停止.对于某些线程,输出甚至显示"重试:4280限制:3".似乎忽略了while条件.
任何帮助和建议都受到欢迎.谢谢!
无限循环的一个可能原因是downloadGet:limit字符串对象.
if limit是string,Python 2.x中的retry < self.limityield True:
>>> retry = 4280
>>> limit = '3'
>>> retry < limit
True
Run Code Online (Sandbox Code Playgroud)
检查limit传递的类型.
| 归档时间: |
|
| 查看次数: |
613 次 |
| 最近记录: |