计算两个Python词典中包含的键的差异

166 python dictionary

假设我有两个Python词典 - dictAdictB.我需要找出是否有任何键存在dictB但不存在dictA.最快的方法是什么?

我应该将字典键转换成一组然后再去吗?

有兴趣了解你的想法......


谢谢你的回复.

抱歉没有正确陈述我的问题.我的情况是这样的 - 我有一个dictA可以相同dictB或者可能有一些键缺失,dictB或者某些键的值可能不同,必须设置为dictA键的值.

问题是字典没有标准,并且可以具有可以作为dict字典的值.

dictA={'key1':a, 'key2':b, 'key3':{'key11':cc, 'key12':dd}, 'key4':{'key111':{....}}}
dictB={'key1':a, 'key2:':newb, 'key3':{'key11':cc, 'key12':newdd, 'key13':ee}.......
Run Code Online (Sandbox Code Playgroud)

因此'key2'值必须重置为新值,并且必须在dict中添加'key13'.键值没有固定格式.它可以是一个简单的价值,也可以是dict的dict或dict.

hug*_*own 233

您可以在键上使用set操作:

diff = set(dictb.keys()) - set(dicta.keys())
Run Code Online (Sandbox Code Playgroud)

这是一个可以找到所有可能性的类:添加的内容,删除的内容,相同的键值对以及更改的键值对.

class DictDiffer(object):
    """
    Calculate the difference between two dictionaries as:
    (1) items added
    (2) items removed
    (3) keys same in both but changed values
    (4) keys same in both and unchanged values
    """
    def __init__(self, current_dict, past_dict):
        self.current_dict, self.past_dict = current_dict, past_dict
        self.set_current, self.set_past = set(current_dict.keys()), set(past_dict.keys())
        self.intersect = self.set_current.intersection(self.set_past)
    def added(self):
        return self.set_current - self.intersect 
    def removed(self):
        return self.set_past - self.intersect 
    def changed(self):
        return set(o for o in self.intersect if self.past_dict[o] != self.current_dict[o])
    def unchanged(self):
        return set(o for o in self.intersect if self.past_dict[o] == self.current_dict[o])
Run Code Online (Sandbox Code Playgroud)

这是一些示例输出:

>>> a = {'a': 1, 'b': 1, 'c': 0}
>>> b = {'a': 1, 'b': 2, 'd': 0}
>>> d = DictDiffer(b, a)
>>> print "Added:", d.added()
Added: set(['d'])
>>> print "Removed:", d.removed()
Removed: set(['c'])
>>> print "Changed:", d.changed()
Changed: set(['b'])
>>> print "Unchanged:", d.unchanged()
Unchanged: set(['a'])
Run Code Online (Sandbox Code Playgroud)

可用作github repo:https: //github.com/hughdbrown/dictdiffer

  • 智能解决方案,谢谢!我通过检查更改或未更改的值是否为dict实例并调用递归函数来使用您的类再次检查它,使其与嵌套dicts一起工作. (3认同)

Sep*_*man 56

如果你想要递归的差异,我已经为python编写了一个包:https: //github.com/seperman/deepdiff

安装

从PyPi安装:

pip install deepdiff
Run Code Online (Sandbox Code Playgroud)

用法示例

输入

>>> from deepdiff import DeepDiff
>>> from pprint import pprint
>>> from __future__ import print_function # In case running on Python 2
Run Code Online (Sandbox Code Playgroud)

同一对象返回空

>>> t1 = {1:1, 2:2, 3:3}
>>> t2 = t1
>>> print(DeepDiff(t1, t2))
{}
Run Code Online (Sandbox Code Playgroud)

项目类型已更改

>>> t1 = {1:1, 2:2, 3:3}
>>> t2 = {1:1, 2:"2", 3:3}
>>> pprint(DeepDiff(t1, t2), indent=2)
{ 'type_changes': { 'root[2]': { 'newtype': <class 'str'>,
                                 'newvalue': '2',
                                 'oldtype': <class 'int'>,
                                 'oldvalue': 2}}}
Run Code Online (Sandbox Code Playgroud)

项目的价值已更改

>>> t1 = {1:1, 2:2, 3:3}
>>> t2 = {1:1, 2:4, 3:3}
>>> pprint(DeepDiff(t1, t2), indent=2)
{'values_changed': {'root[2]': {'newvalue': 4, 'oldvalue': 2}}}
Run Code Online (Sandbox Code Playgroud)

添加和/或删除项目

>>> t1 = {1:1, 2:2, 3:3, 4:4}
>>> t2 = {1:1, 2:4, 3:3, 5:5, 6:6}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff)
{'dic_item_added': ['root[5]', 'root[6]'],
 'dic_item_removed': ['root[4]'],
 'values_changed': {'root[2]': {'newvalue': 4, 'oldvalue': 2}}}
Run Code Online (Sandbox Code Playgroud)

字符串差异

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world"}}
>>> t2 = {1:1, 2:4, 3:3, 4:{"a":"hello", "b":"world!"}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'values_changed': { 'root[2]': {'newvalue': 4, 'oldvalue': 2},
                      "root[4]['b']": { 'newvalue': 'world!',
                                        'oldvalue': 'world'}}}
Run Code Online (Sandbox Code Playgroud)

字符串差异2

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world!\nGoodbye!\n1\n2\nEnd"}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world\n1\n2\nEnd"}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'values_changed': { "root[4]['b']": { 'diff': '--- \n'
                                                '+++ \n'
                                                '@@ -1,5 +1,4 @@\n'
                                                '-world!\n'
                                                '-Goodbye!\n'
                                                '+world\n'
                                                ' 1\n'
                                                ' 2\n'
                                                ' End',
                                        'newvalue': 'world\n1\n2\nEnd',
                                        'oldvalue': 'world!\n'
                                                    'Goodbye!\n'
                                                    '1\n'
                                                    '2\n'
                                                    'End'}}}

>>> 
>>> print (ddiff['values_changed']["root[4]['b']"]["diff"])
--- 
+++ 
@@ -1,5 +1,4 @@
-world!
-Goodbye!
+world
 1
 2
 End
Run Code Online (Sandbox Code Playgroud)

输入更改

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world\n\n\nEnd"}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'type_changes': { "root[4]['b']": { 'newtype': <class 'str'>,
                                      'newvalue': 'world\n\n\nEnd',
                                      'oldtype': <class 'list'>,
                                      'oldvalue': [1, 2, 3]}}}
Run Code Online (Sandbox Code Playgroud)

列表差异

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3, 4]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2]}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{'iterable_item_removed': {"root[4]['b'][2]": 3, "root[4]['b'][3]": 4}}
Run Code Online (Sandbox Code Playgroud)

清单差异2:

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 3, 2, 3]}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'iterable_item_added': {"root[4]['b'][3]": 3},
  'values_changed': { "root[4]['b'][1]": {'newvalue': 3, 'oldvalue': 2},
                      "root[4]['b'][2]": {'newvalue': 2, 'oldvalue': 3}}}
Run Code Online (Sandbox Code Playgroud)

列出差异忽略顺序或重复:(使用与上面相同的词典)

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 3, 2, 3]}}
>>> ddiff = DeepDiff(t1, t2, ignore_order=True)
>>> print (ddiff)
{}
Run Code Online (Sandbox Code Playgroud)

包含字典的列表:

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, {1:1, 2:2}]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, {1:3}]}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'dic_item_removed': ["root[4]['b'][2][2]"],
  'values_changed': {"root[4]['b'][2][1]": {'newvalue': 3, 'oldvalue': 1}}}
Run Code Online (Sandbox Code Playgroud)

集:

>>> t1 = {1, 2, 8}
>>> t2 = {1, 2, 3, 5}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (DeepDiff(t1, t2))
{'set_item_added': ['root[3]', 'root[5]'], 'set_item_removed': ['root[8]']}
Run Code Online (Sandbox Code Playgroud)

命名元组:

>>> from collections import namedtuple
>>> Point = namedtuple('Point', ['x', 'y'])
>>> t1 = Point(x=11, y=22)
>>> t2 = Point(x=11, y=23)
>>> pprint (DeepDiff(t1, t2))
{'values_changed': {'root.y': {'newvalue': 23, 'oldvalue': 22}}}
Run Code Online (Sandbox Code Playgroud)

自定义对象:

>>> class ClassA(object):
...     a = 1
...     def __init__(self, b):
...         self.b = b
... 
>>> t1 = ClassA(1)
>>> t2 = ClassA(2)
>>> 
>>> pprint(DeepDiff(t1, t2))
{'values_changed': {'root.b': {'newvalue': 2, 'oldvalue': 1}}}
Run Code Online (Sandbox Code Playgroud)

添加了对象属性:

>>> t2.c = "new attribute"
>>> pprint(DeepDiff(t1, t2))
{'attribute_added': ['root.c'],
 'values_changed': {'root.b': {'newvalue': 2, 'oldvalue': 1}}}
Run Code Online (Sandbox Code Playgroud)


gho*_*g74 18

不确定它是否"快",但通常情况下,可以做到这一点

dicta = {"a":1,"b":2,"c":3,"d":4}
dictb = {"a":1,"d":2}
for key in dicta.keys():
    if not key in dictb:
        print key
Run Code Online (Sandbox Code Playgroud)

  • `为dicta.keys()中的密钥:`=&gt;`为dicta中的密钥:` (2认同)

Joc*_*zel 15

正如Alex Martelli写的那样,如果你只想检查B中的任何一个键是否不在A中,any(True for k in dictB if k not in dictA)那么将是要走的路.

要找到丢失的密钥:

diff = set(dictB)-set(dictA) #sets

C:\Dokumente und Einstellungen\thc>python -m timeit -s "dictA =    
dict(zip(range(1000),range
(1000))); dictB = dict(zip(range(0,2000,2),range(1000)))" "diff=set(dictB)-set(dictA)"
10000 loops, best of 3: 107 usec per loop

diff = [ k for k in dictB if k not in dictA ] #lc

C:\Dokumente und Einstellungen\thc>python -m timeit -s "dictA = 
dict(zip(range(1000),range
(1000))); dictB = dict(zip(range(0,2000,2),range(1000)))" "diff=[ k for k in dictB if
k not in dictA ]"
10000 loops, best of 3: 95.9 usec per loop
Run Code Online (Sandbox Code Playgroud)

所以这两种解决方案的速度几乎相同.

  • 这更有意义:`任何(在dictB中k不在dictA中) (8认同)

Ale*_*lli 13

如果你真正的意思是你所说的(你只需要找出中间的"有任何钥匙"而不是A中,如果有的话可能不是那些,那么最快的方法应该是:

if any(True for k in dictB if k not in dictA): ...
Run Code Online (Sandbox Code Playgroud)

如果你真的需要找出哪个键,如果有的话,在B而不在A中,而不只是"IF"有这样的键,那么现有的答案是非常合适的(但我确实建议在未来的问题中更精确,如果这是的确是你的意思;-).


小智 8

用途set():

set(dictA.keys()).intersection(dictB.keys())
Run Code Online (Sandbox Code Playgroud)


sof*_*lay 5

stackoverflow中还有一个关于这个参数的问题,我不得不承认有一个简单的解决方案:python 的datadiff库有助于打印两个字典之间的差异.


aba*_*ert 5

hughdbrown的最佳答案建议使用set difference,这绝对是最好的方法:

diff = set(dictb.keys()) - set(dicta.keys())
Run Code Online (Sandbox Code Playgroud)

这段代码的问题在于它构建两个列表只是为了创建两个集合,因此它浪费了4N时间和2N空间.它也比它需要的复杂一点.

通常,这不是什么大问题,但如果是:

diff = dictb.keys() - dicta
Run Code Online (Sandbox Code Playgroud)

Python 2

在Python 2中,keys()返回键的列表,而不是a KeysView.所以你必须viewkeys()直接要求.

diff = dictb.viewkeys() - dicta
Run Code Online (Sandbox Code Playgroud)

对于双版本2.7/3.x代码,您希望使用six或类似的东西,所以您可以使用six.viewkeys(dictb):

diff = six.viewkeys(dictb) - dicta
Run Code Online (Sandbox Code Playgroud)

在2.4-2.6中,没有KeysView.但是你可以通过直接从迭代器中构建你的左集来减少从4N到N的成本,而不是先建立一个列表:

diff = set(dictb) - dicta
Run Code Online (Sandbox Code Playgroud)

项目

我有可能是相同的dictB或可能有一些按键相比dictB或者某些键的值可能是不同的缺少格言

所以你真的不需要比较键,而是项目.一个ItemsView只是Set如果值是哈希的,像字符串.如果是,那很简单:

diff = dictb.items() - dicta.items()
Run Code Online (Sandbox Code Playgroud)

递归差异

虽然问题不是直接要求递归diff,但是一些示例值是dicts,并且看起来预期的输出会递归地区分它们.这里已经有多个答案显示了如何做到这一点.