我有一些像这样的xml片段:
<!DOCTYPE mensaje SYSTEM "record.dtd">
<record>
<player_birthday>1979-09-23</player_birthday>
<player_name>Orene Ai'i</player_name>
<player_team>Blues</player_team>
<player_id>453</player_id>
<player_height>170</player_height>
<player_position>F&W</player_position> <---- a '&' here.
<player_weight>75</player_weight>
</record>
Run Code Online (Sandbox Code Playgroud)
有没有办法验证xml片段是否格式良好?有没有办法根据DTD或XML方案验证xml?
由于各种原因,我不能使用任何第三方包.
例如,上面的xml不是正确的,因为它中有一个'&'.请注意,DOCTYPE定义句子指的是DTD.
jsb*_*eno 30
只是尝试使用ElementTree(xml.etree.ElementTree.fromstring)解析它 - 如果XML格式不正确,它将引发错误.
>>> a = """<record>
... <player_birthday>1979-09-23</player_birthday>
... <player_name>Orene Ai'i</player_name>
... <player_team>Blues</player_team>
... <player_id>453</player_id>
... <player_height>170</player_height>
... <player_position>F&W</player_position> <---- a '&' here.
... <player_weight>75</player_weight>
... </record>"""
>>>
>>> from xml.etree import ElementTree as ET
>>> x = ET.fromstring(a)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1282, in XML
parser.feed(text)
File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1624, in feed
self._raiseerror(v)
File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1488, in _raiseerror
raise err
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 7, column 24
Run Code Online (Sandbox Code Playgroud)
您可以使用python的xml.dom.minidomXML解析器(它位于标准库中,但不像其他替代方案那样强大lxml).
做就是了:
import xml.dom.minidom
xml.dom.minidom.parseString('<My><XML><String/><XML/><My/>')
Run Code Online (Sandbox Code Playgroud)
xml.parsers.expat.ExpatError如果XML无效,您将获得一个.