我写了一个爬虫来从Q&A网站上获取信息.由于并非所有字段都始终显示在页面中,因此我使用了多个try-excepts来处理这种情况.
def answerContentExtractor( loginSession, questionLinkQueue , answerContentList) :
while True:
URL = questionLinkQueue.get()
try:
response = loginSession.get(URL,timeout = MAX_WAIT_TIME)
raw_data = response.text
#These fields must exist, or something went wrong...
questionId = re.findall(REGEX,raw_data)[0]
answerId = re.findall(REGEX,raw_data)[0]
title = re.findall(REGEX,raw_data)[0]
except requests.exceptions.Timeout ,IndexError:
print >> sys.stderr, URL + " extraction error..."
questionLinkQueue.task_done()
continue
try:
questionInfo = re.findall(REGEX,raw_data)[0]
except IndexError:
questionInfo = ""
try:
answerContent = re.findall(REGEX,raw_data)[0]
except IndexError:
answerContent = ""
result = {
'questionId' : questionId,
'answerId' : answerId,
'title' : title,
'questionInfo' : questionInfo,
'answerContent': answerContent
}
answerContentList.append(result)
questionLinkQueue.task_done()
Run Code Online (Sandbox Code Playgroud)
此代码有时可能会也可能不会在运行时发出以下异常:
UnboundLocalError: local variable 'IndexError' referenced before assignment
Run Code Online (Sandbox Code Playgroud)
行号表示第二个错误发生 except IndexError:
感谢大家的建议,愿意给你应得的标记,太糟糕我只能标记一个作为正确的答案......
我认为问题在于这一行:
except requests.exceptions.Timeout ,IndexError
Run Code Online (Sandbox Code Playgroud)
这相当于:
except requests.exceptions.Timeout as IndexError:
Run Code Online (Sandbox Code Playgroud)
所以,你要分配IndexError到被捕获的异常requests.exceptions.Timeout.此代码可以重现错误:
try:
true
except NameError, IndexError:
print IndexError
#name 'true' is not defined
Run Code Online (Sandbox Code Playgroud)
要捕获多个异常,请使用元组:
except (requests.exceptions.Timeout, IndexError):
Run Code Online (Sandbox Code Playgroud)
并且UnboundLocalError因为IndexError您的函数将其视为局部变量,因此在实际定义之前尝试访问其值会引发UnboundLocalError错误.
>>> 'IndexError' in answerContentExtractor.func_code.co_varnames
True
Run Code Online (Sandbox Code Playgroud)
所以,如果这一行没有在runtime(requests.exceptions.Timeout ,IndexError)执行,那么IndexError它下面使用的变量将引发UnboundLocalError.重现错误的示例代码:
def func():
try:
print
except NameError, IndexError:
pass
try:
[][1]
except IndexError:
pass
func()
#UnboundLocalError: local variable 'IndexError' referenced before assignment
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
4190 次 |
| 最近记录: |