两个词典的合并和总和

bad*_*0re 45 python dictionary

我在下面有一个字典,我想添加到另一个字典,不一定是不同的元素,并合并它的结果.有没有内置功能,或者我需要自己制作?

{
  '6d6e7bf221ae24e07ab90bba4452267b05db7824cd3fd1ea94b2c9a8': 6,
  '7c4a462a6ed4a3070b6d78d97c90ac230330603d24a58cafa79caf42': 7,
  '9c37bdc9f4750dd7ee2b558d6c06400c921f4d74aabd02ed5b4ddb38': 9,
  'd3abb28d5776aef6b728920b5d7ff86fa3a71521a06538d2ad59375a': 15,
  '2ca9e1f9cbcd76a5ce1772f9b59995fd32cbcffa8a3b01b5c9c8afc2': 11
}
Run Code Online (Sandbox Code Playgroud)

字典中的元素数量也是未知的.

在合并考虑两个相同的键的情况下,这些键的值应该相加而不是被覆盖.

geo*_*org 133

您没有说明您想要合并的方式,所以请选择:

x = {'both1':1, 'both2':2, 'only_x': 100 }
y = {'both1':10, 'both2': 20, 'only_y':200 }

print { k: x.get(k, 0) + y.get(k, 0) for k in set(x) }
print { k: x.get(k, 0) + y.get(k, 0) for k in set(x) & set(y) }
print { k: x.get(k, 0) + y.get(k, 0) for k in set(x) | set(y) }
Run Code Online (Sandbox Code Playgroud)

结果:

{'both2': 22, 'only_x': 100, 'both1': 11}
{'both2': 22, 'both1': 11}
{'only_y': 200, 'both2': 22, 'both1': 11, 'only_x': 100}
Run Code Online (Sandbox Code Playgroud)

  • 如果我们有 n 个字典,我们该如何实现呢? (5认同)

Sco*_*ott 32

您可以执行+,-,&,和|上(交集和并集)collections.Counter().

所以我们可以执行以下操作(注意:只有正计数值将保留在字典中):

from collections import Counter

x = {'both1':1, 'both2':2, 'only_x': 100 }
y = {'both1':10, 'both2': 20, 'only_y':200 }

z = dict(Counter(x)+Counter(y))

print(z) # {'both2': 22, 'only_x': 100, 'both1': 11, 'only_y': 200}
Run Code Online (Sandbox Code Playgroud)

要解决在结果可能为零的情况下添加值或Counter.update()在添加和Counter.subtract()减法时使用否定值:

x = {'both1':0, 'both2':2, 'only_x': 100 }
y = {'both1':0, 'both2': -20, 'only_y':200 }
xx = Counter(x)
yy = Counter(y)
xx.update(yy)
dict(xx) # {'both2': -18, 'only_x': 100, 'both1': 0, 'only_y': 200}
Run Code Online (Sandbox Code Playgroud)

  • 如果`x`和`y`中的''both1':0`'并且我想在`z'中使用`'both1':0'怎么办?有了这个解决方案,`z`中就不会有`'both1'`键。 (2认同)

NPE*_*NPE 17

你可以使用defaultdict这个:

from collections import defaultdict

def dsum(*dicts):
    ret = defaultdict(int)
    for d in dicts:
        for k, v in d.items():
            ret[k] += v
    return dict(ret)

x = {'both1':1, 'both2':2, 'only_x': 100 }
y = {'both1':10, 'both2': 20, 'only_y':200 }

print(dsum(x, y))
Run Code Online (Sandbox Code Playgroud)

这产生了

{'both1': 11, 'both2': 22, 'only_x': 100, 'only_y': 200}
Run Code Online (Sandbox Code Playgroud)


SCB*_*SCB 13

基于georg,NPEScott的答案的附加说明.

我试图对2个或更多字典的集合执行此操作,并有兴趣看到每个字典花费的时间.因为我想在任意数量的词典上做这个,所以我不得不稍微改变一些答案.如果有人对他们有更好的建议,请随时编辑.

这是我的测试方法.我最近更新了它以包含更多更大词典的测试:

首先我使用了以下数据:

import random

x = {'xy1': 1, 'xy2': 2, 'xyz': 3, 'only_x': 100}
y = {'xy1': 10, 'xy2': 20, 'xyz': 30, 'only_y': 200}
z = {'xyz': 300, 'only_z': 300}

small_tests = [x, y, z]

# 200,000 random 8 letter keys
keys = [''.join(random.choice("abcdefghijklmnopqrstuvwxyz") for _ in range(8)) for _ in range(200000)]

a, b, c = {}, {}, {}

# 50/50 chance of a value being assigned to each dictionary, some keys will be missed but meh
for key in keys:
    if random.getrandbits(1):
        a[key] = random.randint(0, 1000)
    if random.getrandbits(1):
        b[key] = random.randint(0, 1000)
    if random.getrandbits(1):
        c[key] = random.randint(0, 1000)

large_tests = [a, b, c]

print("a:", len(a), "b:", len(b), "c:", len(c))
#: a: 100069 b: 100385 c: 99989
Run Code Online (Sandbox Code Playgroud)

现在每个方法:

from collections import defaultdict, Counter
from functools import reduce

def georg_method(tests):
    return {k: sum(t.get(k, 0) for t in tests) for k in set.union(*[set(t) for t in tests])}

def georg_method_nosum(tests):
    # If you know you will have exactly 3 dicts
    return {k: tests[0].get(k, 0) + tests[1].get(k, 0) + tests[2].get(k, 0) for k in set.union(*[set(t) for t in tests])}

def npe_method(tests):
    ret = defaultdict(int)
    for d in tests:
        for k, v in d.items():
            ret[k] += v
    return dict(ret)

# Note: There is a bug with scott's method. See below for details.
# Scott included a similar version using counters that is fixed
# See the scott_update_method below
def scott_method(tests):
    return dict(sum((Counter(t) for t in tests), Counter()))

def scott_method_nosum(tests):
    # If you know you will have exactly 3 dicts
    return dict(Counter(tests[0]) + Counter(tests[1]) + Counter(tests[2]))

def scott_update_method(tests):
    ret = Counter()
    for test in tests:
        ret.update(test)
    return dict(ret)

def scott_update_method_static(tests):
    # If you know you will have exactly 3 dicts
    xx = Counter(tests[0])
    yy = Counter(tests[1])
    zz = Counter(tests[2])
    xx.update(yy)
    xx.update(zz)
    return dict(xx)

def havok_method(tests):
    def reducer(accumulator, element):
        for key, value in element.items():
            accumulator[key] = accumulator.get(key, 0) + value
        return accumulator
    return reduce(reducer, tests, {})

methods = {
    "georg_method": georg_method, "georg_method_nosum": georg_method_nosum,
    "npe_method": npe_method,
    "scott_method": scott_method, "scott_method_nosum": scott_method_nosum,
    "scott_update_method": scott_update_method, "scott_update_method_static": scott_update_method_static,
    "havok_method": havok_method
}
Run Code Online (Sandbox Code Playgroud)

我还写了一个快速函数,找出列表之间的差异.不幸的是,当我在Scott的方法中发现问题时,也就是说,如果你的字典总数为0,那么由于Counter()添加时的行为方式,字典将不会被包括在内.

最后,结果如下:

结果:小测试

for name, method in methods.items():
    print("Method:", name)
    %timeit -n10000 method(small_tests)
#: Method: georg_method
#: 7.81 µs ± 321 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
#: Method: georg_method_nosum
#: 4.6 µs ± 48.8 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
#: Method: npe_method
#: 3.2 µs ± 24.7 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
#: Method: scott_method
#: 24.9 µs ± 326 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
#: Method: scott_method_nosum
#: 18.9 µs ± 64.8 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
#: Method: scott_update_method
#: 9.1 µs ± 90.7 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
#: Method: scott_update_method_static
#: 14.4 µs ± 122 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
#: Method: havok_method
#: 3.09 µs ± 47.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Run Code Online (Sandbox Code Playgroud)

结果:大型测试

当然,无法在尽可能多的循环附近运行

for name, method in methods.items():
    print("Method:", name)
    %timeit -n10 method(large_tests)
#: Method: georg_method
#: 347 ms ± 20 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
#: Method: georg_method_nosum
#: 280 ms ± 4.97 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
#: Method: npe_method
#: 119 ms ± 11 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
#: Method: scott_method
#: 324 ms ± 16.8 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
#: Method: scott_method_nosum
#: 289 ms ± 14.3 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
#: Method: scott_update_method
#: 123 ms ± 1.94 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
#: Method: scott_update_method_static
#: 136 ms ± 3.19 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
#: Method: havok_method
#: 103 ms ± 1.31 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Run Code Online (Sandbox Code Playgroud)

结论

???????????????????????????????????????????????????????????????????
?                           ?       ?    Best of Time Per Loop    ?
?         Algorithm         ?  By   ???????????????????????????????
?                           ?       ?  small_tests ?  large_tests ?
???????????????????????????????????????????????????????????????????
? fuctools reduce           ? Havok ?       3.1 µs ?   103,000 µs ?
? defaultdict sum           ? NPE   ?       3.2 µs ?   119,000 µs ?
? Counter().update loop     ? Scott ?       9.1 µs ?   123,000 µs ?
? Counter().update static   ? Scott ?      14.4 µs ?   136,000 µs ?
? set unions without sum()  ? georg ?       4.6 µs ?   280,000 µs ?
? set unions with sum()     ? georg ?       7.8 µs ?   347,000 µs ?
? Counter() without sum()   ? Scott ?      18.9 µs ?   289,000 µs ?
? Counter() with sum()      ? Scott ?      24.9 µs ?   324,000 µs ?
???????????????????????????????????????????????????????????????????
Run Code Online (Sandbox Code Playgroud)

重要.因人而异.


Hav*_*vok 6

另一种选择是使用reduce函数。这允许汇总任意词典集合:

from functools import reduce

collection = [
    {'a': 1, 'b': 1},
    {'a': 2, 'b': 2},
    {'a': 3, 'b': 3},
    {'a': 4, 'b': 4, 'c': 1},
    {'a': 5, 'b': 5, 'c': 1},
    {'a': 6, 'b': 6, 'c': 1},
    {'a': 7, 'b': 7},
    {'a': 8, 'b': 8},
    {'a': 9, 'b': 9},
]


def reducer(accumulator, element):
    for key, value in element.items():
        accumulator[key] = accumulator.get(key, 0) + value
    return accumulator


total = reduce(reducer, collection, {})


assert total['a'] == sum(d.get('a', 0) for d in collection)
assert total['b'] == sum(d.get('b', 0) for d in collection)
assert total['c'] == sum(d.get('c', 0) for d in collection)

print(total)
Run Code Online (Sandbox Code Playgroud)

执行:

{'a': 45, 'b': 45, 'c': 3}
Run Code Online (Sandbox Code Playgroud)

好处:

  • 简单,清晰,Pythonic。
  • 无模式,只要所有键都是“可使用的”即可。
  • O(n)时间复杂度和O(1)内存复杂度。