使用ElementTree递归XML解析python

Ara*_*Ara 7 python xml recursion elementtree

我正在尝试使用Python ElementTree解析XML下面的产品输出,如下所示.我正在尝试编写顶级元素的模块来打印它们.然而,它有点棘手,因为category元素可能有也可能没有属性,而cataegory元素里面可能有一个category元素.

我在本主题中提到了上一个问题,但它们并不包含具有相同名称的嵌套元素

我的代码:http: //pastebin.com/Fsv2Xzqf

work.xml:
<suite id="1" name="MainApplication">
<displayNameKey>my Application</displayNameKey>
<displayName>my Application</displayName>
<application id="2" name="Sub Application1">
<displayNameKey>sub Application1</displayNameKey>
<displayName>sub Application1</displayName>
<category id="2423" name="about">
<displayNameKey>subApp.about</displayNameKey>
<displayName>subApp.about</displayName>
<category id="2423" name="comms">
<displayNameKey>subApp.comms</displayNameKey>
<displayName>subApp.comms</displayName>
<property id="5909" name="copyright" type="string_property" width="40">
<value>2014</value>
</property>
<property id="5910" name="os" type="string_property" width="40">
<value>Linux 2.6.32-431.29.2.el6.x86_64</value>
</property>
</category>
<property id="5908" name="releaseNumber" type="string_property" width="40">
<value>9.1.0.3.0.54</value>
</property>
</category>
</application>
</suite>
Run Code Online (Sandbox Code Playgroud)

输出应如下:

Suite: MainApplication
    Application: Sub Application1
        Category: about
            property: releaseNumber | 9.1.0.3.0.54
            category: comms
                property: copyright | 2014
                property: os | Linux 2.6.32-431.29.2.el6.x86_64
Run Code Online (Sandbox Code Playgroud)

任何正确方向的指针都会有所帮助.

Jat*_*mar 9

import xml.etree.ElementTree as ET
tree = ET.ElementTree(file='work.xml')

indent = 0
ignoreElems = ['displayNameKey', 'displayName']

def printRecur(root):
    """Recursively prints the tree."""
    if root.tag in ignoreElems:
        return
    print ' '*indent + '%s: %s' % (root.tag.title(), root.attrib.get('name', root.text))
    global indent
    indent += 4
    for elem in root.getchildren():
        printRecur(elem)
    indent -= 4

root = tree.getroot()
printRecur(root)
Run Code Online (Sandbox Code Playgroud)

OUTPUT:

Suite: MainApplication
    Application: Sub Application1
        Category: about
            Category: comms
                Property: copyright
                    Value: 2014
                Property: os
                    Value: Linux 2.6.32-431.29.2.el6.x86_64
            Property: releaseNumber
                Value: 9.1.0.3.0.54
Run Code Online (Sandbox Code Playgroud)

这是我能在5分钟内得到的最接近的.你应该只是递归调用一个处理器函数,这将需要注意.从这一点你可以改进:)


您还可以为每个标记定义处理函数,并将它们全部放在字典中以便于查找.然后你可以检查你是否有适合该标签的处理函数,然后调用其他只是继续盲目打印.例如:

HANDLERS = {
    'property': 'handle_property',
    <tag_name>: <handler_function>
}

def handle_property(root):
    """Takes property root element and prints the values."""
    data = ' '*indent + '%s: %s ' % (root.tag.title(), root.attrib['name'])
    values = []
    for elem in root.getchildren():
        if elem.tag == 'value':
            values.append(elem.text)
    print data + '| %s' % (', '.join(values))

# printRecur would get modified accordingly.
def printRecur(root):
    """Recursively prints the tree."""
    if root.tag in ignoreElems:
        return

    global indent
    indent += 4

    if root.tag in HANDLERS:
        handler = globals()[HANDLERS[root.tag]]
        handler(root)
    else:
        print ' '*indent + '%s: %s' % (root.tag.title(), root.attrib.get('name', root.text))
        for elem in root.getchildren():
            printRecur(elem)

    indent -= 4
Run Code Online (Sandbox Code Playgroud)

以上输出:

Suite: MainApplication
    Application: Sub Application1
        Category: about
            Category: comms
                Property: copyright | 2014
                Property: os | Linux 2.6.32-431.29.2.el6.x86_64
            Property: releaseNumber | 9.1.0.3.0.54
Run Code Online (Sandbox Code Playgroud)

我发现这非常有用,而不是在代码中加入大量的if/else.

  • 在函数中声明全局之前使用缩进 (2认同)