小编jor*_*bas的帖子

为什么lxml中的此元素包含尾部？

考虑以下Python脚本：

from lxml import etree

html = '''
<html xmlns="http://www.w3.org/1999/xhtml">
<head></head>
  <body>
    <p>This is some text followed with 2 citations.<span class="footnote">1</span>
       <span ?lass="footnote">2</span>This is some more text.</p>
  </body>
</html>'''

tree = etree.fromstring(html)

for element in tree.findall(".//{*}span"):
    if element.get("class") == 'footnote':
        print(etree.tostring(element, encoding="unicode", pretty_print=True))

Run Code Online (Sandbox Code Playgroud)

所需的输出将是2个span元素，而是得到：

<span xmlns="http://www.w3.org/1999/xhtml" class="footnote">1</span>
<span xmlns="http://www.w3.org/1999/xhtml" class="footnote">2</span>This is some more text.

Run Code Online (Sandbox Code Playgroud)

为什么在元素之后直到父元素的末尾都包含文本？

我正在尝试使用lxml链接脚注，当我a.insert()将span元素添加到a为其创建的元素中时，它包含之后的文本，因此链接了许多我不想链接的文本。

html python lxml

jor*_*bas

2013 11-22

5
推荐指数

1
解决办法

426
查看次数