可以使用lxml检查xml文件是否格式正确,还是功能太强大?

Cel*_*tas 1 xml lxml python-3.x

可以使用lxml检查xml是否格式正确或功能是否强大?例如,即使xml格式不正确,它似乎也能够解析。检查xml文件格式是否正确的最简单方法是什么?

har*_*r07 5

lxml 解析格式不正确的XML时应该抛出异常,例如:

from lxml import etree

xml = """
<multipleroot>
    <noclosingtag>
</multipleroot>
<multipleroot></multipleroot>"""
doc = etree.fromstring(xml)
Run Code Online (Sandbox Code Playgroud)

抛出异常:

Traceback (most recent call last):
  File "D:\StackOverflow\Python\Q50.py", line 8, in <module>
    doc = etree.fromstring(xml)
  ......
  ......
XMLSyntaxError: Opening and ending tag mismatch: noclosingtag line 3 and multipleroot, line 4, column 16
Run Code Online (Sandbox Code Playgroud)

但是,如果你明确地告诉XMLParser恢复非格式良好的XML,或者你使用的HTMLParser不是,lxml可能还是能够解析XML:

from lxml import etree

xml = """
<multipleroot>
    <noclosingtag>
</multipleroot>
<multipleroot></multipleroot>"""
parser = etree.XMLParser(recover=True)
#parser = etree.HTMLParser()
doc = etree.fromstring(xml, parser=parser)
print(etree.tostring(doc))
Run Code Online (Sandbox Code Playgroud)

成功打印已解析的XML:

<multipleroot>
    <noclosingtag>
</noclosingtag>
<multipleroot/></multipleroot>
Run Code Online (Sandbox Code Playgroud)