python中的lxml iterparse无法处理名称空间

Jam*_*ley 7 python lxml iterparse

from lxml import etree
import StringIO

data= StringIO.StringIO('<root xmlns="http://some.random.schema"><a>One</a><a>Two</a><a>Three</a></root>')
docs = etree.iterparse(data,tag='a')
a,b = docs.next()


Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "iterparse.pxi", line 478, in lxml.etree.iterparse.__next__ (src/lxml/lxml.etree.c:95348)
  File "iterparse.pxi", line 534, in lxml.etree.iterparse._read_more_events (src/lxml/lxml.etree.c:95938)
StopIteration
Run Code Online (Sandbox Code Playgroud)

工作正常,直到我将命名空间添加到根节点.关于我可以做什么作为一种解决方法的任何想法,或者这样做的正确方法?由于文件很大,我需要被事件驱动.

jwh*_*ock 10

当附加名称空间时,标签不是a,它是{http://some.random.schema}a.试试这个:

from lxml import etree
from io import BytesIO

xml = '''\
<root xmlns="http://some.random.schema">
  <a>One</a>
  <a>Two</a>
  <a>Three</a>
</root>'''
data = BytesIO(xml.encode())
docs = etree.iterparse(data, tag='{http://some.random.schema}a')
for event, elem in docs:
    print(f'{event}: {elem}')
Run Code Online (Sandbox Code Playgroud)

  • 您也可以使用通配符忽略名称空间([链接到文档](https://lxml.de/tutorial.html#namespaces)):`docs = etree.iterparse(data,tag ='{*} a')` (2认同)