Python,计算列表差异

Mik*_*ike 179 python list

在Python中,计算两个列表之间差异的最佳方法是什么?

A = [1,2,3,4]
B = [2,5]

A - B = [1,3,4]
B - A = [5]
Run Code Online (Sandbox Code Playgroud)

phi*_*hag 348

如果订单无关紧要,您可以简单地计算设定差异:

>>> set([1,2,3,4]) - set([2,5])
set([1, 4, 3])
>>> set([2,5]) - set([1,2,3,4])
set([5])
Run Code Online (Sandbox Code Playgroud)

  • 取决于申请:如果订单或复制保存很重要,Roman Bodnarchuk可能会有更好的方法.对于速度和纯粹的集合行为,这个似乎更好. (15认同)
  • 这是迄今为止最好的解决方案.列表上的测试用例大约有6000个字符串,表明这种方法几乎比列表推导快100倍. (8认同)
  • 这个解决方案看起来很明显,但它是不正确的。抱歉。当然,我们的意思是列表可以有重复的相等元素。否则,我们会询问集合之间的差异,而不是列表差异。 (8认同)
  • 如果列表中有多个相等的元素,则此解决方案将不起作用. (6认同)

Rom*_*huk 186

使用set,如果你不关心项目的顺序或重复.如果您这样做,请使用列表推导:

>>> def diff(first, second):
        second = set(second)
        return [item for item in first if item not in second]

>>> diff(A, B)
[1, 3, 4]
>>> diff(B, A)
[5]
>>> 
Run Code Online (Sandbox Code Playgroud)

  • 考虑使用`set(b)`来确保算法是O(nlogn)而不是Theta(n ^ 2) (29认同)
  • @Pencilcheck - 如果您关心A中的排序或重复,请不要这样做.将'set`应用于B是无害的,但将其应用于`A`并使用结果而不是原来的`A`则不是. (8认同)

Sen*_*ran 65

你可以做一个

list(set(A)-set(B))
Run Code Online (Sandbox Code Playgroud)

list(set(B)-set(A))
Run Code Online (Sandbox Code Playgroud)

  • 但如果A = [1,1,1]且B = [0]则返回[1] (5认同)
  • @cloudy 那么这并不能回答问题。 (4认同)

Art*_*nka 25

一个班轮:

diff = lambda l1,l2: [x for x in l1 if x not in l2]
diff(A,B)
diff(B,A)
Run Code Online (Sandbox Code Playgroud)

要么:

diff = lambda l1,l2: filter(lambda x: x not in l2, l1)
diff(A,B)
diff(B,A)
Run Code Online (Sandbox Code Playgroud)


Kev*_*vin 14

上述例子使计算差异的问题变得微不足道.假设排序或重复数据删除肯定会使计算差异更容易,但如果您的比较无法承担这些假设,那么您将需要一个非平凡的diff算法实现.请参阅python标准库中的difflib.

from difflib import SequenceMatcher 

squeeze=SequenceMatcher( None, A, B )

print "A - B = [%s]"%( reduce( lambda p,q: p+q, 
                               map( lambda t: squeeze.a[t[1]:t[2]], 
                                    filter(lambda x:x[0]!='equal', 
                                           squeeze.get_opcodes() ) ) ) )
Run Code Online (Sandbox Code Playgroud)

A - B = [[1,3,4]]


Mor*_*eno 13

Python 2.7.3(默认,2014年2月27日,19:58:35) - IPython 1.1.0 - timeit :( github gist)

def diff(a, b):
  b = set(b)
  return [aa for aa in a if aa not in b]

def set_diff(a, b):
  return list(set(a) - set(b))

diff_lamb_hension = lambda l1,l2: [x for x in l1 if x not in l2]

diff_lamb_filter = lambda l1,l2: filter(lambda x: x not in l2, l1)

from difflib import SequenceMatcher
def squeezer(a, b):
  squeeze = SequenceMatcher(None, a, b)
  return reduce(lambda p,q: p+q, map(
    lambda t: squeeze.a[t[1]:t[2]],
      filter(lambda x:x[0]!='equal',
        squeeze.get_opcodes())))
Run Code Online (Sandbox Code Playgroud)

结果:

# Small
a = range(10)
b = range(10/2)

timeit[diff(a, b)]
100000 loops, best of 3: 1.97 µs per loop

timeit[set_diff(a, b)]
100000 loops, best of 3: 2.71 µs per loop

timeit[diff_lamb_hension(a, b)]
100000 loops, best of 3: 2.1 µs per loop

timeit[diff_lamb_filter(a, b)]
100000 loops, best of 3: 3.58 µs per loop

timeit[squeezer(a, b)]
10000 loops, best of 3: 36 µs per loop

# Medium
a = range(10**4)
b = range(10**4/2)

timeit[diff(a, b)]
1000 loops, best of 3: 1.17 ms per loop

timeit[set_diff(a, b)]
1000 loops, best of 3: 1.27 ms per loop

timeit[diff_lamb_hension(a, b)]
1 loops, best of 3: 736 ms per loop

timeit[diff_lamb_filter(a, b)]
1 loops, best of 3: 732 ms per loop

timeit[squeezer(a, b)]
100 loops, best of 3: 12.8 ms per loop

# Big
a = xrange(10**7)
b = xrange(10**7/2)

timeit[diff(a, b)]
1 loops, best of 3: 1.74 s per loop

timeit[set_diff(a, b)]
1 loops, best of 3: 2.57 s per loop

timeit[diff_lamb_filter(a, b)]
# too long to wait for

timeit[diff_lamb_filter(a, b)]
# too long to wait for

timeit[diff_lamb_filter(a, b)]
# TypeError: sequence index must be integer, not 'slice'
Run Code Online (Sandbox Code Playgroud)

@ roman-bodnarchuk列表理解函数def diff(a,b)似乎更快.


Sak*_*rma 9

A = [1,2,3,4]
B = [2,5]

#A - B
x = list(set(A) - set(B))
#B - A 
y = list(set(B) - set(A))

print x
print y 
Run Code Online (Sandbox Code Playgroud)


The*_*uck 8

您可能希望使用a set而不是a list.


Sep*_*man 6

如果您希望差异递归地深入到列表中的项目,我已经为 python 编写了一个包:https : //github.com/erasmose/deepdiff

安装

从 PyPi 安装:

pip install deepdiff
Run Code Online (Sandbox Code Playgroud)

如果你是 Python3 你还需要安装:

pip install future six
Run Code Online (Sandbox Code Playgroud)

示例用法

>>> from deepdiff import DeepDiff
>>> from pprint import pprint
>>> from __future__ import print_function
Run Code Online (Sandbox Code Playgroud)

同一个对象返回空

>>> t1 = {1:1, 2:2, 3:3}
>>> t2 = t1
>>> ddiff = DeepDiff(t1, t2)
>>> print (ddiff.changes)
    {}
Run Code Online (Sandbox Code Playgroud)

项目类型已更改

>>> t1 = {1:1, 2:2, 3:3}
>>> t2 = {1:1, 2:"2", 3:3}
>>> ddiff = DeepDiff(t1, t2)
>>> print (ddiff.changes)
    {'type_changes': ["root[2]: 2=<type 'int'> vs. 2=<type 'str'>"]}
Run Code Online (Sandbox Code Playgroud)

物品的价值发生了变化

>>> t1 = {1:1, 2:2, 3:3}
>>> t2 = {1:1, 2:4, 3:3}
>>> ddiff = DeepDiff(t1, t2)
>>> print (ddiff.changes)
    {'values_changed': ['root[2]: 2 ====>> 4']}
Run Code Online (Sandbox Code Playgroud)

添加和/或删除的项目

>>> t1 = {1:1, 2:2, 3:3, 4:4}
>>> t2 = {1:1, 2:4, 3:3, 5:5, 6:6}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff.changes)
    {'dic_item_added': ['root[5, 6]'],
     'dic_item_removed': ['root[4]'],
     'values_changed': ['root[2]: 2 ====>> 4']}
Run Code Online (Sandbox Code Playgroud)

字符串差异

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world"}}
>>> t2 = {1:1, 2:4, 3:3, 4:{"a":"hello", "b":"world!"}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff.changes, indent = 2)
    { 'values_changed': [ 'root[2]: 2 ====>> 4',
                          "root[4]['b']:\n--- \n+++ \n@@ -1 +1 @@\n-world\n+world!"]}
>>>
>>> print (ddiff.changes['values_changed'][1])
    root[4]['b']:
    --- 
    +++ 
    @@ -1 +1 @@
    -world
    +world!
Run Code Online (Sandbox Code Playgroud)

弦差2

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world!\nGoodbye!\n1\n2\nEnd"}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world\n1\n2\nEnd"}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff.changes, indent = 2)
    { 'values_changed': [ "root[4]['b']:\n--- \n+++ \n@@ -1,5 +1,4 @@\n-world!\n-Goodbye!\n+world\n 1\n 2\n End"]}
>>>
>>> print (ddiff.changes['values_changed'][0])
    root[4]['b']:
    --- 
    +++ 
    @@ -1,5 +1,4 @@
    -world!
    -Goodbye!
    +world
     1
     2
     End
Run Code Online (Sandbox Code Playgroud)

类型更改

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world\n\n\nEnd"}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff.changes, indent = 2)
    { 'type_changes': [ "root[4]['b']: [1, 2, 3]=<type 'list'> vs. world\n\n\nEnd=<type 'str'>"]}
Run Code Online (Sandbox Code Playgroud)

清单差异

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2]}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff.changes, indent = 2)
    { 'list_removed': ["root[4]['b']: [3]"]}
Run Code Online (Sandbox Code Playgroud)

列表差异 2:请注意,它不考虑顺序

>>> # Note that it DOES NOT take order into account
... t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 3, 2]}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff.changes, indent = 2)
    { }
Run Code Online (Sandbox Code Playgroud)

包含字典的列表:

>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, {1:1, 2:2}]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, {1:3}]}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff.changes, indent = 2)
    { 'dic_item_removed': ["root[4]['b'][2][2]"],
      'values_changed': ["root[4]['b'][2][1]: 1 ====>> 3"]}
Run Code Online (Sandbox Code Playgroud)


Moh*_*med 5

最简单的方法,

使用set().difference(set())

list_a = [1,2,3]
list_b = [2,3]
print set(list_a).difference(set(list_b))
Run Code Online (Sandbox Code Playgroud)

答案是 set([1])