Python XML解析

Ple*_*Guy 9 python xml parsing lxml

*注意:lxml将无法在我的系统上运行.我希望找到一个不涉及lxml的解决方案.

我已经在这里浏览了一些文档,并且我很难按照我的意愿开始工作.我想解析一些看起来像这样的XML文件:

<dict>
    <key>1375</key>
    <dict>
        <key>Key 1</key><integer>1375</integer>
        <key>Key 2</key><string>Some String</string>
        <key>Key 3</key><string>Another string</string>
        <key>Key 4</key><string>Yet another string</string>
        <key>Key 5</key><string>Strings anyone?</string>
    </dict>
</dict>
Run Code Online (Sandbox Code Playgroud)

在我试图操纵的文件中,有更多的"dict"跟随这个.我想通读XML并输出一个如下所示的text/dat文件:

1375,"Some String","Another String","又一个字符串","Strings any?"

...

EOF

**最初,我尝试使用lxml,但经过多次尝试让它在我的系统上工作,我继续使用DOM.最近,我尝试使用Etree来完成这项任务.对于所有善良的爱,请有人帮助我吗?我是Python的新手,想了解它是如何工作的.我提前谢谢你.

Joh*_*hin 10

您可以使用Python附带的xml.etree.ElementTree.有一个包括伴侣C实现(即更快)xml.etree.cElementTree.lxml.etree提供功能的超集,但不需要你想做的事情.

@Acorn提供的代码对我(Python 2.7,Windows 7)的作用与以下每个导入相同:

import xml.etree.ElementTree as et
import xml.etree.cElementTree as et
import lxml.etree as et
...
tree = et.fromstring(xmltext)
...
Run Code Online (Sandbox Code Playgroud)

您使用的是什么操作系统以及您遇到了哪些安装问题lxml


Aco*_*orn 7

import xml.etree.ElementTree as et
import csv

xmltext = """
<dicts>
    <key>1375</key>
    <dict>
        <key>Key 1</key><integer>1375</integer>
        <key>Key 2</key><string>Some String</string>
        <key>Key 3</key><string>Another string</string>
        <key>Key 4</key><string>Yet another string</string>
        <key>Key 5</key><string>Strings anyone?</string>
    </dict>
</dicts>
"""

f = open('output.txt', 'w')

writer = csv.writer(f, quoting=csv.QUOTE_NONNUMERIC)

tree = et.fromstring(xmltext)

# iterate over the dict elements
for dict_el in tree.iterfind('dict'):
    data = []
    # get the text contents of each non-key element
    for el in dict_el:
        if el.tag == 'string':
            data.append(el.text)
        # if it's an integer element convert to int so csv wont quote it
        elif el.tag == 'integer':
            data.append(int(el.text))
    writer.writerow(data)
Run Code Online (Sandbox Code Playgroud)