如何在python中有效地找到两个字典之间的所有差异

jo2*_*248 1 python dictionary

所以,我有 2 个字典,我必须检查缺少的键和匹配的键,检查它们是否具有相同或不同的值。

dict1 = {..}
dict2 = {..}
#key values in a list that are missing in each
missing_in_dict1_but_in_dict2 = []
missing_in_dict2_but_in_dict1 = []
#key values in a list that are mismatched between the 2 dictionaries
mismatch = []
Run Code Online (Sandbox Code Playgroud)

执行此操作的最有效方法是什么?

Mar*_*ers 5

您可以使用作为集合的字典视图对象。减去集合以获得差异:

missing_in_dict1_but_in_dict2 = dict2.keys() - dict1
missing_in_dict2_but_in_dict1 = dict1.keys() - dict2
Run Code Online (Sandbox Code Playgroud)

对于相同的键,使用交集和&运算符:

mismatch = {key for key in dict1.keys() & dict2 if dict1[key] != dict2[key]}
Run Code Online (Sandbox Code Playgroud)

如果您仍在使用 Python 2,请使用dict.viewkeys().

使用字典视图产生交集和差异非常有效,视图对象本身非常轻量级,从集合操作创建新集合的算法可以直接利用底层字典的 O(1) 查找行为。

演示:

>>> dict1 = {'foo': 42, 'bar': 81}
>>> dict2 = {'bar': 117, 'spam': 'ham'}
>>> dict2.keys() - dict1
{'spam'}
>>> dict1.keys() - dict2
{'foo'}
>>> [key for key in dict1.keys() & dict2 if dict1[key] != dict2[key]]
{'bar'}
Run Code Online (Sandbox Code Playgroud)

以及与创建单独set()对象的性能比较:

>>> import timeit
>>> import random
>>> def difference_views(d1, d2):
...     missing1 = d2.keys() - d1
...     missing2 = d1.keys() - d2
...     mismatch = {k for k in d1.keys() & d2 if d1[k] != d2[k]}
...     return missing1, missing2, mismatch
...
>>> def difference_sets(d1, d2):
...     missing1 = set(d2) - set(d1)
...     missing2 = set(d1) - set(d2)
...     mismatch = {k for k in set(d1) & set(d2) if d1[k] != d2[k]}
...     return missing1, missing2, mismatch
...
>>> testd1 = {random.randrange(1000000): random.randrange(1000000) for _ in range(10000)}
>>> testd2 = {random.randrange(1000000): random.randrange(1000000) for _ in range(10000)}
>>> timeit.timeit('d(d1, d2)', 'from __main__ import testd1 as d1, testd2 as d2, difference_views as d', number=1000)
1.8643521590274759
>>> timeit.timeit('d(d1, d2)', 'from __main__ import testd1 as d1, testd2 as d2, difference_sets as d', number=1000)
2.811345119960606
Run Code Online (Sandbox Code Playgroud)

使用set()对象较慢,尤其是当您的输入字典变大时。