python:比较2个实例列表

Question

python:比较2个实例列表

1 python search class list instance

我有2个实例列表list1 list2

每个实例都包含id,name等变量......我正在遍历list2,我想找到list1中不存在的条目.例如..

列表2中的条目:如果list1中的entry.id:

我希望找到一种没有双循环的方法来做到这一点.有一个简单的方法吗？

Answer 1

mgi*_*son 9

我可能会这样做:

set1 = set((x.id,x.name,...) for x in list1)
difference = [ x for x in list2 if (x.id,x.name,...) not in set1 ]

Run Code Online (Sandbox Code Playgroud)

...实例的附加(可清除)属性在哪里- 您需要包含足够多的内容以使其唯一.

这将采用您的O(N*M)算法并将其转换为O(max(N,M))算法.

Answer 2

unu*_*tbu 6

只是一个想法...

class Foo(object):
    def __init__(self, id, name):
        self.id = id
        self.name = name
    def __repr__(self):
        return '({},{})'.format(self.id, self.name)

list1 = [Foo(1,'a'),Foo(1,'b'),Foo(2,'b'),Foo(3,'c'),]
list2 = [Foo(1,'a'),Foo(2,'c'),Foo(2,'b'),Foo(4,'c'),]

Run Code Online (Sandbox Code Playgroud)

因此，通常这是行不通的：

print(set(list1)-set(list2))
# set([(1,b), (2,b), (3,c), (1,a)])

Run Code Online (Sandbox Code Playgroud)

但是您可以教导Foo两个实例相等意味着什么：

def __hash__(self):
    return hash((self.id, self.name))

def __eq__(self, other):
    try:
        return (self.id, self.name) == (other.id, other.name)
    except AttributeError:
        return NotImplemented

Foo.__hash__ = __hash__
Foo.__eq__ = __eq__

Run Code Online (Sandbox Code Playgroud)

现在：

print(set(list1)-set(list2))
# set([(3,c), (1,b)])

Run Code Online (Sandbox Code Playgroud)

当然，更有可能的是，你可以定义__hash__和__eq__对Foo在类定义时的需要，而不是后来猴子修补它，：

class Foo(object):
    def __init__(self, id, name):
        self.id = id
        self.name = name

    def __repr__(self):
        return '({},{})'.format(self.id, self.name)

    def __hash__(self):
        return hash((self.id, self.name))

    def __eq__(self, other):
        try:
            return (self.id, self.name) == (other.id, other.name)
        except AttributeError:
            return NotImplemented

Run Code Online (Sandbox Code Playgroud)

为了满足我自己的好奇心，这里有一个基准：

In [34]: list1 = [Foo(1,'a'),Foo(1,'b'),Foo(2,'b'),Foo(3,'c')]*10000

In [35]: list2 = [Foo(1,'a'),Foo(2,'c'),Foo(2,'b'),Foo(4,'c')]*10000
In [40]: %timeit set1 = set((x.id,x.name) for x in list1); [x for x in list2 if (x.id,x.name) not in set1 ]
100 loops, best of 3: 15.3 ms per loop

In [41]: %timeit set1 = set(list1); [x for x in list2 if x not in set1]
10 loops, best of 3: 33.2 ms per loop

Run Code Online (Sandbox Code Playgroud)

所以@ mgilson的方法速度快，但定义__hash__并__eq__在Foo导致更可读的代码。

归档时间：	12 年，11 月前
查看次数：	8161 次
最近记录：	6 年，2 月前