在Parsed XML中忠实地保留注释(Python 2.7)

Jam*_*son 13 python xml elementtree python-2.7

在操作XML时,我希望尽可能忠实地保留注释.

我设法保留了评论,但内容正在进行XML转义.

#!/usr/bin/env python
# add_host_to_tomcat.py

import xml.etree.ElementTree as ET
from CommentedTreeBuilder import CommentedTreeBuilder
parser = CommentedTreeBuilder()

if __name__ == '__main__':
    filename = "/opt/lucee/tomcat/conf/server.xml"

    # this is the important part: use the comment-preserving parser
    tree = ET.parse(filename, parser)

    # get the node to add a child to
    engine_node = tree.find("./Service/Engine")

    # add a node: Engine.Host
    host_node = ET.SubElement(
        engine_node,
        "Host",
        name="local.mysite.com",
        appBase="webapps"
    )
    # add a child to new node: Engine.Host.Context
    ET.SubElement(
        host_node,
        'Context',
        path="",
        docBase="/path/to/doc/base"
    )

    tree.write('out.xml')
Run Code Online (Sandbox Code Playgroud)
#!/usr/bin/env python
# CommentedTreeBuilder.py

from xml.etree import ElementTree

class CommentedTreeBuilder ( ElementTree.XMLTreeBuilder ):
    def __init__ ( self, html = 0, target = None ):
        ElementTree.XMLTreeBuilder.__init__( self, html, target )
        self._parser.CommentHandler = self.handle_comment

    def handle_comment ( self, data ):
        self._target.start( ElementTree.Comment, {} )
        self._target.data( data )
        self._target.end( ElementTree.Comment )
Run Code Online (Sandbox Code Playgroud)

但是,评论如下:

  <!--
EXAMPLE HOST ENTRY:
    <Host name="lucee.org" appBase="webapps">
         <Context path="" docBase="/var/sites/getrailo.org" />
     <Alias>www.lucee.org</Alias>
     <Alias>my.lucee.org</Alias>
    </Host>

HOST ENTRY TEMPLATE:
    <Host name="[ENTER DOMAIN NAME]" appBase="webapps">
         <Context path="" docBase="[ENTER SYSTEM PATH]" />
     <Alias>[ENTER DOMAIN ALIAS]</Alias>
    </Host>
  -->
Run Code Online (Sandbox Code Playgroud)

最终为:

  <!--
            EXAMPLE HOST ENTRY:
    &lt;Host name="lucee.org" appBase="webapps"&gt;
         &lt;Context path="" docBase="/var/sites/getrailo.org" /&gt;
         &lt;Alias&gt;www.lucee.org&lt;/Alias&gt;
         &lt;Alias&gt;my.lucee.org&lt;/Alias&gt;
    &lt;/Host&gt;

    HOST ENTRY TEMPLATE:
    &lt;Host name="[ENTER DOMAIN NAME]" appBase="webapps"&gt;
         &lt;Context path="" docBase="[ENTER SYSTEM PATH]" /&gt;
         &lt;Alias&gt;[ENTER DOMAIN ALIAS]&lt;/Alias&gt;
    &lt;/Host&gt;
   -->
Run Code Online (Sandbox Code Playgroud)

我也尝试self._target.data( saxutils.unescape(data) )CommentedTreeBuilder.py,但它似乎没有做任何事情.事实上,我认为问题发生在handle_commment()步骤之后的某个地方.

顺便说一句,这个问题与类似.

Mar*_*gur 21

使用Python 2.7和3.5进行测试,以下代码应按预期工作.

#!/usr/bin/env python
# CommentedTreeBuilder.py
from xml.etree import ElementTree

class CommentedTreeBuilder(ElementTree.TreeBuilder):
    def comment(self, data):
        self.start(ElementTree.Comment, {})
        self.data(data)
        self.end(ElementTree.Comment)
Run Code Online (Sandbox Code Playgroud)

然后,在主代码中使用

parser = ET.XMLParser(target=CommentedTreeBuilder())
Run Code Online (Sandbox Code Playgroud)

作为解析器而不是当前的解析器.

顺便说一句,评论开箱即用lxml.也就是说,你可以做到

import lxml.etree as ET
tree = ET.parse(filename)
Run Code Online (Sandbox Code Playgroud)

不需要任何上述内容.

  • xml和lxml都默认省略了&lt;?xml ...?&gt;和&lt;!DOCTYPE ...&gt;声明。http://stackoverflow.com/q/18173983/2997179和http://stackoverflow.com/q/12966488/2997179提供了解决方案。 (3认同)

ber*_*nie 8

Python 3.8 added the insert_comments argument to TreeBuilder which:

class xml.etree.ElementTree.TreeBuilder(element_factory=None, *, comment_factory=None, pi_factory=None, insert_comments=False, insert_pis=False)

When insert_comments and/or insert_pis is true, comments/pis will be inserted into the tree if they appear within the root element (but not outside of it).

Example:

parser = ElementTree.XMLParser(target=ElementTree.TreeBuilder(insert_comments=True))
Run Code Online (Sandbox Code Playgroud)

  • 使用Python3.8为我工作 (2认同)

小智 5

Martin's Code 对我不起作用。我修改了以下内容,按预期工作。

import xml.etree.ElementTree as ET

class CommentedTreeBuilder(ET.XMLTreeBuilder):
    def __init__(self, *args, **kwargs):
        super(CommentedTreeBuilder, self).__init__(*args, **kwargs)
        self._parser.CommentHandler = self.comment

    def comment(self, data):
        self._target.start(ET.Comment, {})
        self._target.data(data)
        self._target.end(ET.Comment)
Run Code Online (Sandbox Code Playgroud)

这是测试

    parser=CommentedTreeBuilder()
    tree = ET.parse(filename, parser)
    tree.write('out.xml')
Run Code Online (Sandbox Code Playgroud)


小智 5

马丁的答案是正确的,只是缺少一些代码,我知道这对于更有经验的程序员来说可能是显而易见的,但作为一个新程序员,我花了一分钟才理解:马丁的答案:

import xml.etree.ElementTree as ET
from xml.etree import ElementTree

class CommentedTreeBuilder(ElementTree.TreeBuilder):
# This class will retain remarks and comments opposed to the xml parser default
    def comment(self, data):
        self.start(ElementTree.Comment, {})
        self.data(data)
        self.end(ElementTree.Comment)

# the missing part:
def parse_xml_with_remarks(filepath):
    ctb = CommentedTreeBuilder()
    xp = ET.XMLParser(target=ctb)
    tree = ET.parse(filepath, parser=xp)
    return tree

# parsing the file, and getting root
tree=parse_xml_with_remarks(file)
root=tree.getroot()
Run Code Online (Sandbox Code Playgroud)


归档时间:

查看次数:

6979 次

最近记录:

5 年,10 月 前