字典词典合并

Question

字典词典合并

fdh*_*hex 108 python merge dictionary array-merge

我需要合并多个词典,这是我的例子:

dict1 = {1:{"a":{A}}, 2:{"b":{B}}}

dict2 = {2:{"c":{C}}, 3:{"d":{D}}

Run Code Online (Sandbox Code Playgroud)

随着A B C与D树的叶,像{"info1":"value", "info2":"value2"}

字典有一个未知的级别(深度),它可能是 {2:{"c":{"z":{"y":{C}}}}}

在我的例子中,它代表一个目录/文件结构,其中节点是docs并且是文件.

我想合并它们以获得:

 dict3 = {1:{"a":{A}}, 2:{"b":{B},"c":{C}}, 3:{"d":{D}}}

Run Code Online (Sandbox Code Playgroud)

我不确定如何使用Python轻松完成这项工作.

Answer 1

and*_*oke 124

这实际上非常棘手 - 特别是如果你想要一个有用的错误信息,当事情不一致,同时正确接受重复但一致的条目(这里没有其他答案....)

假设您没有大量条目,则递归函数最简单:

def merge(a, b, path=None):
    "merges b into a"
    if path is None: path = []
    for key in b:
        if key in a:
            if isinstance(a[key], dict) and isinstance(b[key], dict):
                merge(a[key], b[key], path + [str(key)])
            elif a[key] == b[key]:
                pass # same leaf value
            else:
                raise Exception('Conflict at %s' % '.'.join(path + [str(key)]))
        else:
            a[key] = b[key]
    return a

# works
print(merge({1:{"a":"A"},2:{"b":"B"}}, {2:{"c":"C"},3:{"d":"D"}}))
# has conflict
merge({1:{"a":"A"},2:{"b":"B"}}, {1:{"a":"A"},2:{"b":"C"}})

Run Code Online (Sandbox Code Playgroud)

请注意,这会发生变化a- b添加的内容a(也会返回).如果你想保持a你可以称之为merge(dict(a), b).

agf指出(下面)你可能有两个以上的dicts,在这种情况下你可以使用:

reduce(merge, [dict1, dict2, dict3...])

Run Code Online (Sandbox Code Playgroud)

将所有内容添加到dict1.

[注意 - 我编辑了我最初的答案来改变第一个论点; 这使"减少"更容易解释]

ps在python 3中,你也需要 from functools import reduce

> 如果你想保留 a 你可以这样调用它： merge(dict(a), b) 请注意，嵌套的字典仍然会发生变化。为了避免这种情况，请使用“copy.deepcopy”。 (3认同)
对于将列表作为 dicts 下的最终嵌套级别的任何人，您可以这样做而不是引发错误以连接两个列表：`a[key] = a[key] + b[key]`。感谢您的帮助。 (2认同)

Answer 2

jte*_*ace 27

这是使用生成器执行此操作的简单方法:

def mergedicts(dict1, dict2):
    for k in set(dict1.keys()).union(dict2.keys()):
        if k in dict1 and k in dict2:
            if isinstance(dict1[k], dict) and isinstance(dict2[k], dict):
                yield (k, dict(mergedicts(dict1[k], dict2[k])))
            else:
                # If one of the values is not a dict, you can't continue merging it.
                # Value from second dict overrides one in first and we move on.
                yield (k, dict2[k])
                # Alternatively, replace this with exception raiser to alert you of value conflicts
        elif k in dict1:
            yield (k, dict1[k])
        else:
            yield (k, dict2[k])

dict1 = {1:{"a":"A"},2:{"b":"B"}}
dict2 = {2:{"c":"C"},3:{"d":"D"}}

print dict(mergedicts(dict1,dict2))

Run Code Online (Sandbox Code Playgroud)

这打印:

{1: {'a': 'A'}, 2: {'c': 'C', 'b': 'B'}, 3: {'d': 'D'}}

Run Code Online (Sandbox Code Playgroud)

为了增加优雅性，删除yield括号并将第一个for循环更改为`for k in set(dict1) | 设置（dict2）：`。 (4认同)
我发现这特别有帮助.但最好的方法是让函数将冲突作为参数来解决. (2认同)

Answer 3

Tra*_*rke 26

你可以试试mergeeep。

安装

$ pip3 install mergedeep

Run Code Online (Sandbox Code Playgroud)

用法

from mergedeep import merge

a = {"keyA": 1}
b = {"keyB": {"sub1": 10}}
c = {"keyB": {"sub2": 20}}

merge(a, b, c) 

print(a)
# {"keyA": 1, "keyB": {"sub1": 10, "sub2": 20}}

Run Code Online (Sandbox Code Playgroud)

有关选项的完整列表，请查看文档！

我找到了需要 merge({}, a, b) 的解决方案 (2认同)

Answer 4

小智 20

这个问题的一个问题是dict的值可以是任意复杂的数据.基于这些和其他答案,我想出了这段代码:

class YamlReaderError(Exception):
    pass

def data_merge(a, b):
    """merges b into a and return merged result

    NOTE: tuples and arbitrary objects are not handled as it is totally ambiguous what should happen"""
    key = None
    # ## debug output
    # sys.stderr.write("DEBUG: %s to %s\n" %(b,a))
    try:
        if a is None or isinstance(a, str) or isinstance(a, unicode) or isinstance(a, int) or isinstance(a, long) or isinstance(a, float):
            # border case for first run or if a is a primitive
            a = b
        elif isinstance(a, list):
            # lists can be only appended
            if isinstance(b, list):
                # merge lists
                a.extend(b)
            else:
                # append to list
                a.append(b)
        elif isinstance(a, dict):
            # dicts must be merged
            if isinstance(b, dict):
                for key in b:
                    if key in a:
                        a[key] = data_merge(a[key], b[key])
                    else:
                        a[key] = b[key]
            else:
                raise YamlReaderError('Cannot merge non-dict "%s" into dict "%s"' % (b, a))
        else:
            raise YamlReaderError('NOT IMPLEMENTED "%s" into "%s"' % (b, a))
    except TypeError, e:
        raise YamlReaderError('TypeError "%s" in key "%s" when merging "%s" into "%s"' % (e, key, b, a))
    return a

Run Code Online (Sandbox Code Playgroud)

我的用例是合并YAML文件,我只需要处理可能数据类型的子集.因此我可以忽略元组和其他对象.对我来说,合理的合并逻辑意味着

替换标量
追加清单
通过添加缺失密钥和更新现有密钥来合并dicts

其他一切和不可预见的事都会导致错误.

"isinstance"序列可以被替换为`isinstance(a,(str,unicode,int,long,float))`isnt'吗？ (2认同)

Answer 5

Aar*_*all 12

字典词典合并

由于这是规范问题(尽管存在某些非一般性),但我提供了规范的Pythonic方法来解决这个问题.

最简单的情况:"叶子是嵌套的dicts,以空的dicts结尾":

d1 = {'a': {1: {'foo': {}}, 2: {}}}
d2 = {'a': {1: {}, 2: {'bar': {}}}}
d3 = {'b': {3: {'baz': {}}}}
d4 = {'a': {1: {'quux': {}}}}

Run Code Online (Sandbox Code Playgroud)

这是递归的最简单的情况,我建议两种天真的方法:

def rec_merge1(d1, d2):
    '''return new merged dict of dicts'''
    for k, v in d1.items(): # in Python 2, use .iteritems()!
        if k in d2:
            d2[k] = rec_merge1(v, d2[k])
    d3 = d1.copy()
    d3.update(d2)
    return d3

def rec_merge2(d1, d2):
    '''update first dict with second recursively'''
    for k, v in d1.items(): # in Python 2, use .iteritems()!
        if k in d2:
            d2[k] = rec_merge2(v, d2[k])
    d1.update(d2)
    return d1

Run Code Online (Sandbox Code Playgroud)

我相信我更喜欢第二个到第一个,但请记住,第一个的原始状态必须从其原点重建.这是用法:

>>> from functools import reduce # only required for Python 3.
>>> reduce(rec_merge1, (d1, d2, d3, d4))
{'a': {1: {'quux': {}, 'foo': {}}, 2: {'bar': {}}}, 'b': {3: {'baz': {}}}}
>>> reduce(rec_merge2, (d1, d2, d3, d4))
{'a': {1: {'quux': {}, 'foo': {}}, 2: {'bar': {}}}, 'b': {3: {'baz': {}}}}

Run Code Online (Sandbox Code Playgroud)

复杂案例:"叶子属于任何其他类型:"

因此,如果它们以结尾语结束,那么合并最终空的dicts就是一个简单的例子.如果没有,那不是那么微不足道.如果是字符串,你如何合并它们？集可以类似地更新,因此我们可以给予该处理,但是我们失去了它们被合并的顺序.订单也很重要吗？

因此,代替更多信息,最简单的方法是在两个值都不是dicts的情况下为它们提供标准更新处理:即第二个dict的值将覆盖第一个,即使第二个dict的值为None且第一个值为a dict有很多信息.

d1 = {'a': {1: 'foo', 2: None}}
d2 = {'a': {1: None, 2: 'bar'}}
d3 = {'b': {3: 'baz'}}
d4 = {'a': {1: 'quux'}}

from collections import MutableMapping

def rec_merge(d1, d2):
    '''
    Update two dicts of dicts recursively, 
    if either mapping has leaves that are non-dicts, 
    the second's leaf overwrites the first's.
    '''
    for k, v in d1.items(): # in Python 2, use .iteritems()!
        if k in d2:
            # this next check is the only difference!
            if all(isinstance(e, MutableMapping) for e in (v, d2[k])):
                d2[k] = rec_merge(v, d2[k])
            # we could further check types and merge as appropriate here.
    d3 = d1.copy()
    d3.update(d2)
    return d3

Run Code Online (Sandbox Code Playgroud)

现在

from functools import reduce
reduce(rec_merge, (d1, d2, d3, d4))

Run Code Online (Sandbox Code Playgroud)

回报

{'a': {1: 'quux', 2: 'bar'}, 'b': {3: 'baz'}}

Run Code Online (Sandbox Code Playgroud)

申请原始问题:

我不得不删除字母周围的花括号,并将它们放在单引号中,这是合法的Python(否则它们将在Python 2.7+中设置文字)以及附加缺少的括号:

dict1 = {1:{"a":'A'}, 2:{"b":'B'}}
dict2 = {2:{"c":'C'}, 3:{"d":'D'}}

Run Code Online (Sandbox Code Playgroud)

而rec_merge(dict1, dict2)现在返回:

{1: {'a': 'A'}, 2: {'c': 'C', 'b': 'B'}, 3: {'d': 'D'}}

Run Code Online (Sandbox Code Playgroud)

哪个匹配原始问题的期望结果(更改后,例如{A}to 'A'.)

Answer 6

小智 9

基于@andrew cooke.此版本处理嵌套的dicts列表,并允许选项更新值

def merge(a, b, path=None, update=True):
    "http://stackoverflow.com/questions/7204805/python-dictionaries-of-dictionaries-merge"
    "merges b into a"
    if path is None: path = []
    for key in b:
        if key in a:
            if isinstance(a[key], dict) and isinstance(b[key], dict):
                merge(a[key], b[key], path + [str(key)])
            elif a[key] == b[key]:
                pass # same leaf value
            elif isinstance(a[key], list) and isinstance(b[key], list):
                for idx, val in enumerate(b[key]):
                    a[key][idx] = merge(a[key][idx], b[key][idx], path + [str(key), str(idx)], update=update)
            elif update:
                a[key] = b[key]
            else:
                raise Exception('Conflict at %s' % '.'.join(path + [str(key)]))
        else:
            a[key] = b[key]
    return a

Answer 7

Han*_*ter 7

简短而甜蜜：

from collections.abc import MutableMapping as Map

def nested_update(d, v):
    """
    Nested update of dict-like 'd' with dict-like 'v'.
    """

    for key in v:
        if key in d and isinstance(d[key], Map) and isinstance(v[key], Map):
            nested_update(d[key], v[key])
        else:
            d[key] = v[key]

Run Code Online (Sandbox Code Playgroud)

这与 Python 的方法类似（并且构建于 Python 的dict.update方法之上）。它会返回None（如果您愿意，您可以随时添加），因为它会就地return d更新字典。d输入的键v将覆盖任何现有的键d（它不会尝试解释字典的内容）。

它也适用于其他（“类似字典”）映射。

例子：

people = {'pete': {'gender': 'male'}, 'mary': {'age': 34}}
nested_update(people, {'pete': {'age': 41}})

# Pete's age was merged in
print(people)
{'pete': {'gender': 'male', 'age': 41}, 'mary': {'age': 34}}

Run Code Online (Sandbox Code Playgroud)

Python 的常规dict.update方法产生：

people = {'pete': {'gender': 'male'}, 'mary': {'age': 34}}
people.update({'pete': {'age': 41}})

# We lost Pete's gender here!
print(people) 
{'pete': {'age': 41}, 'mary': {'age': 34}}

Run Code Online (Sandbox Code Playgroud)

Answer 8

Spe*_*bun 6

如果你有一个未知级别的词典,那么我会建议一个递归函数:

def combineDicts(dictionary1, dictionary2):
    output = {}
    for item, value in dictionary1.iteritems():
        if dictionary2.has_key(item):
            if isinstance(dictionary2[item], dict):
                output[item] = combineDicts(value, dictionary2.pop(item))
        else:
            output[item] = value
    for item, value in dictionary2.iteritems():
         output[item] = value
    return output

Run Code Online (Sandbox Code Playgroud)

Answer 9

Mic*_*tor 5

这个简单的递归过程将一个字典合并到另一个字典中，同时覆盖冲突的键：

#!/usr/bin/env python2.7

def merge_dicts(dict1, dict2):
    """ Recursively merges dict2 into dict1 """
    if not isinstance(dict1, dict) or not isinstance(dict2, dict):
        return dict2
    for k in dict2:
        if k in dict1:
            dict1[k] = merge_dicts(dict1[k], dict2[k])
        else:
            dict1[k] = dict2[k]
    return dict1

print (merge_dicts({1:{"a":"A"}, 2:{"b":"B"}}, {2:{"c":"C"}, 3:{"d":"D"}}))
print (merge_dicts({1:{"a":"A"}, 2:{"b":"B"}}, {1:{"a":"A"}, 2:{"b":"C"}}))

Run Code Online (Sandbox Code Playgroud)

输出：

{1: {'a': 'A'}, 2: {'c': 'C', 'b': 'B'}, 3: {'d': 'D'}}
{1: {'a': 'A'}, 2: {'b': 'C'}}

Run Code Online (Sandbox Code Playgroud)

Answer 10

Sas*_*cha 5

概述

以下方法将字典深度合并的问题细分为：

一个参数化的浅合并函数merge(f)(a,b)，它使用一个函数f来合并两个字典a和b
递归合并函数f与merge

执行

合并两个（非嵌套）字典的函数可以用多种方式编写。我个人喜欢

def merge(f):
    def merge(a,b): 
        keys = a.keys() | b.keys()
        return {key:f(a.get(key), b.get(key)) for key in keys}
    return merge

Run Code Online (Sandbox Code Playgroud)

定义适当的递归合并函数的一个好方法f是使用multipledispatch，它允许定义根据参数类型沿不同路径计算的函数。

from multipledispatch import dispatch

#for anything that is not a dict return
@dispatch(object, object)
def f(a, b):
    return b if b is not None else a

#for dicts recurse 
@dispatch(dict, dict)
def f(a,b):
    return merge(f)(a,b)

Run Code Online (Sandbox Code Playgroud)

例子

要合并两个嵌套的字典，只需使用merge(f)例如：

dict1 = {1:{"a":"A"},2:{"b":"B"}}
dict2 = {2:{"c":"C"},3:{"d":"D"}}
merge(f)(dict1, dict2)
#returns {1: {'a': 'A'}, 2: {'b': 'B', 'c': 'C'}, 3: {'d': 'D'}}

Run Code Online (Sandbox Code Playgroud)

笔记：

这种方法的优点是：

该函数由较小的函数构建而成，每个函数只做一件事情，这使得代码更易于推理和测试
该行为不是硬编码的，但可以根据需要更改和扩展，以提高代码重用性（请参见下面的示例）。

定制

一些答案还考虑了包含列表的字典，例如其他（可能嵌套的）字典。在这种情况下，人们可能想要映射列表并根据位置合并它们。这可以通过向合并函数添加另一个定义来完成f：

import itertools
@dispatch(list, list)
def f(a,b):
    return [merge(f)(*arg) for arg in itertools.zip_longest(a, b)]

Run Code Online (Sandbox Code Playgroud)

Answer 11

小智 5

基于@andrew cooke的答案。它以更好的方式处理嵌套列表。

def deep_merge_lists(original, incoming):
    """
    Deep merge two lists. Modifies original.
    Recursively call deep merge on each correlated element of list. 
    If item type in both elements are
     a. dict: Call deep_merge_dicts on both values.
     b. list: Recursively call deep_merge_lists on both values.
     c. any other type: Value is overridden.
     d. conflicting types: Value is overridden.

    If length of incoming list is more that of original then extra values are appended.
    """
    common_length = min(len(original), len(incoming))
    for idx in range(common_length):
        if isinstance(original[idx], dict) and isinstance(incoming[idx], dict):
            deep_merge_dicts(original[idx], incoming[idx])

        elif isinstance(original[idx], list) and isinstance(incoming[idx], list):
            deep_merge_lists(original[idx], incoming[idx])

        else:
            original[idx] = incoming[idx]

    for idx in range(common_length, len(incoming)):
        original.append(incoming[idx])


def deep_merge_dicts(original, incoming):
    """
    Deep merge two dictionaries. Modifies original.
    For key conflicts if both values are:
     a. dict: Recursively call deep_merge_dicts on both values.
     b. list: Call deep_merge_lists on both values.
     c. any other type: Value is overridden.
     d. conflicting types: Value is overridden.

    """
    for key in incoming:
        if key in original:
            if isinstance(original[key], dict) and isinstance(incoming[key], dict):
                deep_merge_dicts(original[key], incoming[key])

            elif isinstance(original[key], list) and isinstance(incoming[key], list):
                deep_merge_lists(original[key], incoming[key])

            else:
                original[key] = incoming[key]
        else:
            original[key] = incoming[key]

Run Code Online (Sandbox Code Playgroud)

Answer 12

Dav*_*der 5

如果有人想要另一种方法来解决这个问题，这是我的解决方案。

美德：简短、声明式和功能性风格（递归，没有变化）。

潜在缺点：这可能不是您正在寻找的合并。请参阅文档字符串以了解语义。

def deep_merge(a, b):
    """
    Merge two values, with `b` taking precedence over `a`.

    Semantics:
    - If either `a` or `b` is not a dictionary, `a` will be returned only if
      `b` is `None`. Otherwise `b` will be returned.
    - If both values are dictionaries, they are merged as follows:
        * Each key that is found only in `a` or only in `b` will be included in
          the output collection with its value intact.
        * For any key in common between `a` and `b`, the corresponding values
          will be merged with the same semantics.
    """
    if not isinstance(a, dict) or not isinstance(b, dict):
        return a if b is None else b
    else:
        # If we're here, both a and b must be dictionaries or subtypes thereof.

        # Compute set of all keys in both dictionaries.
        keys = set(a.keys()) | set(b.keys())

        # Build output dictionary, merging recursively values with common keys,
        # where `None` is used to mean the absence of a value.
        return {
            key: deep_merge(a.get(key), b.get(key))
            for key in keys
        }

Run Code Online (Sandbox Code Playgroud)

@schmittsfn 这是一个字典理解。翻译为 `dict([key, deep_merge(a.get(key), b.get(key))] for key in keys)` (2认同)

归档时间：	14 年，4 月前
查看次数：	49432 次
最近记录：	6 年，3 月前