我可能会这样做:
set1 = set((x.id,x.name,...) for x in list1)
difference = [ x for x in list2 if (x.id,x.name,...) not in set1 ]
Run Code Online (Sandbox Code Playgroud)
...实例的附加(可清除)属性在哪里- 您需要包含足够多的内容以使其唯一.
这将采用您的O(N*M)算法并将其转换为O(max(N,M))算法.
只是一个想法...
class Foo(object):
def __init__(self, id, name):
self.id = id
self.name = name
def __repr__(self):
return '({},{})'.format(self.id, self.name)
list1 = [Foo(1,'a'),Foo(1,'b'),Foo(2,'b'),Foo(3,'c'),]
list2 = [Foo(1,'a'),Foo(2,'c'),Foo(2,'b'),Foo(4,'c'),]
Run Code Online (Sandbox Code Playgroud)
因此,通常这是行不通的:
print(set(list1)-set(list2))
# set([(1,b), (2,b), (3,c), (1,a)])
Run Code Online (Sandbox Code Playgroud)
但是您可以教导Foo两个实例相等意味着什么:
def __hash__(self):
return hash((self.id, self.name))
def __eq__(self, other):
try:
return (self.id, self.name) == (other.id, other.name)
except AttributeError:
return NotImplemented
Foo.__hash__ = __hash__
Foo.__eq__ = __eq__
Run Code Online (Sandbox Code Playgroud)
现在:
print(set(list1)-set(list2))
# set([(3,c), (1,b)])
Run Code Online (Sandbox Code Playgroud)
当然,更有可能的是,你可以定义__hash__和__eq__对Foo在类定义时的需要,而不是后来猴子修补它,:
class Foo(object):
def __init__(self, id, name):
self.id = id
self.name = name
def __repr__(self):
return '({},{})'.format(self.id, self.name)
def __hash__(self):
return hash((self.id, self.name))
def __eq__(self, other):
try:
return (self.id, self.name) == (other.id, other.name)
except AttributeError:
return NotImplemented
Run Code Online (Sandbox Code Playgroud)
为了满足我自己的好奇心,这里有一个基准:
In [34]: list1 = [Foo(1,'a'),Foo(1,'b'),Foo(2,'b'),Foo(3,'c')]*10000
In [35]: list2 = [Foo(1,'a'),Foo(2,'c'),Foo(2,'b'),Foo(4,'c')]*10000
In [40]: %timeit set1 = set((x.id,x.name) for x in list1); [x for x in list2 if (x.id,x.name) not in set1 ]
100 loops, best of 3: 15.3 ms per loop
In [41]: %timeit set1 = set(list1); [x for x in list2 if x not in set1]
10 loops, best of 3: 33.2 ms per loop
Run Code Online (Sandbox Code Playgroud)
所以@ mgilson的方法速度快,但定义__hash__并__eq__在Foo导致更可读的代码。