Cel*_*tas 1 xml lxml python-3.x
可以使用lxml检查xml是否格式正确或功能是否强大?例如,即使xml格式不正确,它似乎也能够解析。检查xml文件格式是否正确的最简单方法是什么?
lxml 解析格式不正确的XML时应该抛出异常,例如:
from lxml import etree
xml = """
<multipleroot>
<noclosingtag>
</multipleroot>
<multipleroot></multipleroot>"""
doc = etree.fromstring(xml)
Run Code Online (Sandbox Code Playgroud)
抛出异常:
Traceback (most recent call last):
File "D:\StackOverflow\Python\Q50.py", line 8, in <module>
doc = etree.fromstring(xml)
......
......
XMLSyntaxError: Opening and ending tag mismatch: noclosingtag line 3 and multipleroot, line 4, column 16
Run Code Online (Sandbox Code Playgroud)
但是,如果你明确地告诉XMLParser恢复非格式良好的XML,或者你使用的HTMLParser不是,lxml可能还是能够解析XML:
from lxml import etree
xml = """
<multipleroot>
<noclosingtag>
</multipleroot>
<multipleroot></multipleroot>"""
parser = etree.XMLParser(recover=True)
#parser = etree.HTMLParser()
doc = etree.fromstring(xml, parser=parser)
print(etree.tostring(doc))
Run Code Online (Sandbox Code Playgroud)
成功打印已解析的XML:
<multipleroot>
<noclosingtag>
</noclosingtag>
<multipleroot/></multipleroot>
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
1512 次 |
| 最近记录: |