Beautifulsoup,达到最大递归深度

yay*_*ayu 6 python beautifulsoup

这是一个beautifulsoup抓取所有<p>html标签内容的过程.从某些网页抓取内容后,我收到一条错误,指出超出了最大递归深度.

def printText(tags):
    for tag in tags:
        if tag.__class__ == NavigableString:
            print tag,
        else:
            printText(tag)
    print ""
#loop over urls, send soup to printText procedure
Run Code Online (Sandbox Code Playgroud)

追踪的底部:

 File "web_content.py", line 16, in printText
    printText(tag)
  File "web_content.py", line 16, in printText
    printText(tag)
  File "web_content.py", line 16, in printText
    printText(tag)
  File "web_content.py", line 16, in printText
    printText(tag)
  File "web_content.py", line 16, in printText
    printText(tag)
  File "web_content.py", line 13, in printText
    if tag.__class__ == NavigableString:
RuntimeError: maximum recursion depth exceeded in cmp
Run Code Online (Sandbox Code Playgroud)

Leo*_*son 5

如果遇到除NavigableString以外的任何内容,printText()会以递归方式调用自身.这包括NavigableString的子类,例如Comment.在Comment上调用printText()会迭代注释的文本,并导致您看到的无限递归.

我建议在if语句中使用isinstance()而不是比较类对象:

if isinstance(tag, basestring):
Run Code Online (Sandbox Code Playgroud)

我通过在递归之前插入一个print语句来诊断此问题:

print "recursing on", tag, type(tag)
printText(tag)
Run Code Online (Sandbox Code Playgroud)


Ign*_*ams 1

你可能击中了一根弦。迭代字符串会产生 1 长度的字符串。迭代该 1 长度的字符串会生成 1 长度的字符串。迭代那个1 长度的字符串...