rpr*_*ero 2 python xml decoding elementtree
我正在尝试从 XML 文档中提取一个转义节点。节点的原始文本如下所示:
<Notes>{"Phase": 0, "Flipper": 0, "Guide": 0,
"Sample": 0, "Triangle8": 0, "Triangle5": 0,
"Triangle4": 0, "Triangle7": 0, "Triangle6": 0,
"Triangle1": 0, "Triangle3": 0, "Triangle2": 0}</Notes>
Run Code Online (Sandbox Code Playgroud)
我将文本提取如下:
infile = ET.parse("C:/userfiles/EXP011/SESAME_60/SESAME_60_runinfo.xml")
r = infile.getroot()
XMLNS = "{http://example.com/foo/bar/runinfo_v4_3}"
x=r.find(".//"+XMLNS+"Notes")
print(x.text)
Run Code Online (Sandbox Code Playgroud)
我希望得到:
{"Phase": 0, "Flipper": 0, "Guide"": 0,
"Sample": 0, "Triangle8": 0, "Triangle5": 0,
"Triangle4": 0, "Triangle7": 0, "Triangle6": 0,
"Triangle1": 0, "Triangle3": 0, "Triangle2": 0}
Run Code Online (Sandbox Code Playgroud)
但是,相反,我得到了:
{"Phase": 0, "Flipper": 0, "Guide": 0,
"Sample": 0, "Triangle8": 0, "Triangle5": 0,
"Triangle4": 0, "Triangle7": 0, "Triangle6": 0,
"Triangle1": 0, "Triangle3": 0, "Triangle2": 0}
Run Code Online (Sandbox Code Playgroud)
我如何获得未转义的字符串?
使用HTMLParser.HTMLParser()
:
In [8]: import HTMLParser
In [11]: HTMLParser.HTMLParser().unescape('"')
Out[11]: u'"'
Run Code Online (Sandbox Code Playgroud)
saxutils 处理<
, >
and &
,但不处理"
.
In [9]: import xml.sax.saxutils as saxutils
In [10]: saxutils.unescape('"')
Out[10]: '"'
Run Code Online (Sandbox Code Playgroud)
既然python 3.4
你可以使用html.unescape
.
>>> from html import unescape
>>> unescape('"')
'"'
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
7451 次 |
最近记录: |