我如何映射到字典而不是列表？

Question

我如何映射到字典而不是列表？

sig*_*nce 1 python xml lxml dictionary list

我有以下功能,它将lxml对象映射到字典的基本工作...

from lxml import etree 

tree = etree.parse('file.xml')
root = tree.getroot()

def xml_to_dict(el):
    d={}
    if el.text:
        print '***write tag as string'
        d[el.tag] = el.text
    else:
        d[el.tag] = {}
    children = el.getchildren()
    if children:
        d[el.tag] = map(xml_to_dict, children)
    return d

    v = xml_to_dict(root)

Run Code Online (Sandbox Code Playgroud)

此刻它给了我......

>>>print v
{'root': [{'a': '1'}, {'a': [{'b': '2'}, {'b': '2'}]}, {'aa': '1a'}]}

Run Code Online (Sandbox Code Playgroud)

但我想....

>>>print v
{'root': {'a': ['1', {'b': [2, 2]}], 'aa': '1a'}}

Run Code Online (Sandbox Code Playgroud)

我如何重写函数xml_to_dict(el)以便获得所需的输出？

这是我正在解析的xml,为清楚起见.

<root>
    <a>1</a>
    <a>
        <b>2</b>
        <b>2</b>
    </a>
    <aa>1a</aa>
</root>

Run Code Online (Sandbox Code Playgroud)

谢谢 :)

Answer 1

Tho*_*ers 5

好吧,map()总会返回一个列表,所以简单的答案是"不要使用map()".相反,通过循环children并将结果分配给xml_to_dict(child)您要使用的字典键,构建一个类似于您的字典.看起来您想要使用标记作为键,并将值设置为具有该标记的项目列表,因此它将变为类似于:

import collections
from lxml import etree

tree = etree.parse('file.xml')
root = tree.getroot()

def xml_to_dict(el):
    d={}
    if el.text:
        print '***write tag as string'
        d[el.tag] = el.text
    child_dicts = collections.defaultdict(list)
    for child in el.getchildren():
        child_dicts[child.tag].append(xml_to_dict(child))
    if child_dicts:
        d[el.tag] = child_dicts
    return d

xml_to_dict(root)

Run Code Online (Sandbox Code Playgroud)

这使得dict中的tag条目成为defaultdict; 如果你出于某种原因需要正常的字典,请使用d[el.tag] = dict(child_dicts).请注意,与之前一样,如果标签同时包含文本和子项,则文本将不会出现在dict中.您可能想要为您的dict考虑不同的布局来应对.

编辑:

在你的改写问题中产生输出的代码不会递归xml_to_dict- 因为你只需要外部元素的字典,而不是所有子标记.所以,你会使用类似的东西:

import collections
from lxml import etree

tree = etree.parse('file.xml')
root = tree.getroot()

def xml_to_item(el):
    if el.text:
        print '***write tag as string'
        item = el.text
    child_dicts = collections.defaultdict(list)
    for child in el.getchildren():
        child_dicts[child.tag].append(xml_to_item(child))
    return dict(child_dicts) or item

def xml_to_dict(el):
    return {el.tag: xml_to_item(el)}

print xml_to_dict(root)

Run Code Online (Sandbox Code Playgroud)

这仍然无法处理带有文本和子项的标签,并且它会collections.defaultdict(list)变成普通的dict,因此输出(几乎)与您期望的一样:

***write tag as string
***write tag as string
***write tag as string
***write tag as string
***write tag as string
***write tag as string
{'root': {'a': ['1', {'b': ['2', '2']}], 'aa': ['1a']}}

Run Code Online (Sandbox Code Playgroud)

(如果你真的想要整数而不是b标签中的文本数据的字符串,你必须以某种方式明确地将它们变成整数.)

归档时间：	15 年，2 月前
查看次数：	1193 次
最近记录：	11 年，5 月前