Doc*_*cOc 5 python elementtree
我有一个 XML 文档,如下所示:
<!-- Servlet Context Listener -->
<listener>
<listener-class>
com.company.servlet.StartupShutdownListener
</listener-class>
</listener>
<!-- Servlet Class Definitions -->
<servlet>
<servlet-name>
AdminServlet
</servlet-name>
<servlet-class>
AdminServlet
</servlet-class>
<load-on-startup>
1
</load-on-startup>
</servlet>
Run Code Online (Sandbox Code Playgroud)
为了使其更具可读性,我发现了 indent() 函数http://effbot.org/zone/element-lib.htm#prettyprint,它使输出更好。
但是,我只想进一步格式化 Comment 元素,以使它们更易于查看。例如,只需在每个注释之前和之后添加一个额外的空行,就可以使人们更容易看到这些块:
<!-- Servlet Context Listener -->
<listener>
<listener-class>
com.company.servlet.StartupShutdownListener
</listener-class>
</listener>
<!-- Servlet Class Definitions -->
<servlet>
<servlet-name>AdminServlet</servlet-name>
<servlet-class>AdminServlet</servlet-class>
<load-on-startup>1</load-on-startup>
</servlet>
Run Code Online (Sandbox Code Playgroud)
如何检测 indent() 函数中的 Comment 元素?
在网上搜索其他有同样问题的人无果后,我转向源代码(https://svn.python.org/projects/python/trunk/Lib/xml/etree/ElementTree.py)。答案相当简单:
import ElementTree as ET
...
def indent(elem, level=0): # where elem is of type ET.Element
....
if elem.tag is ET.Comment:
...
Run Code Online (Sandbox Code Playgroud)
关键是要认识到,虽然常规 XML 元素上的“tag”属性带有 XML 标记名称(例如“listener”或“servlet”),但对于表示 XML 注释的元素来说,它是 Comment() 函数本身。
这是完整更新的 indent() 函数,用于执行注释格式,如上所示:
def indent(elem, level=0, prev_elem=None, prev_level=0):
i = "\n" + level*" "
if len(elem):
if not elem.text or not elem.text.strip():
elem.text = i + " "
if not elem.tail or not elem.tail.strip():
elem.tail = i
prev_elem_local = elem
prev_level_local = level
for elem in elem:
indent(elem, level+1, prev_elem_local, prev_level_local)
prev_elem_local = elem
prev_level_local = level + 1
if not elem.tail or not elem.tail.strip():
elem.tail = i
else:
if level and (not elem.tail or not elem.tail.strip()):
elem.tail = i
if elem.tag is ET.Comment:
if prev_level == level:
prev_elem.tail = "\n" + prev_elem.tail
elif prev_level < level:
prev_elem.text = "\n" + prev_elem.text
elem.tail = "\n" + elem.tail
Run Code Online (Sandbox Code Playgroud)