算法比较两个列表并在python中获取相同的元素

Jos*_*siP 0 python

我需要列表,其中包含一些共同的元素:

p = [('link1/d/b/c', 'target1/d/b/c'), ('link2/a/g/c', 'target2/a/g/c'), ..., ('linkn/b/b/f', 'targetn/b/b/f')]

q = [['target1/d/b/c', 'target1', 123, 334], ['targetn/b/b/f', 'targetn', 23, 64], ... ,['targetx/f/f/f', 'targetx', 999, 888]]
Run Code Online (Sandbox Code Playgroud)

我试图比较它们并找到共同的元素,然后用结果做一些工作:

do_job('target1/d/b/c', 'target1', 123, 334, 'link1/d/b/c')
Run Code Online (Sandbox Code Playgroud)

现在我使用简单和非常慢的alghortihm:

for item in p:
   link = item[0]
   target = item[1]
   for item2 in q:
       target2 = item2[0]
       if target2 == target:
           do_some_job(...)
Run Code Online (Sandbox Code Playgroud)

我知道,我需要比较这两个列表并创建一个包含所有元素的列表,例如:

pq = [['target1/d/b/c', 'target1', 123, 334, 'link1/d/b/c'], ..., ['targetn/b/b/f', 'targetn', 23, 64, 'linkn/b/b/f']]
Run Code Online (Sandbox Code Playgroud)

然后do_some_job(pq)每当我找到相同的元素时调用而不是调用它

如何获得它?

最好的祝福

Ash*_*ary 5

用于chain()展平两个列表,然后使用set()intersection()获取公共元素.

In [78]: from itertools import chain

In [79]: p
Out[79]: 
[('link1/d/b/c', 'target1/d/b/c'),
 ('link2/a/g/c', 'target2/a/g/c'),
 ('linkn/b/b/f', 'targetn/b/b/f')]

In [80]: q
Out[80]: 
[['target1/d/b/c', 'target1', 123, 334],
 ['targetn/b/b/f', 'targetn', 23, 64],
 ['targetx/f/f/f', 'targetx', 999, 888]]

In [81]: set(chain(*p)).intersection(set(chain(*q)))
Out[81]: set(['target1/d/b/c', 'targetn/b/b/f'])
Run Code Online (Sandbox Code Playgroud)

或使用列表理解与短路:

In [86]: [j for i in p for j in i if j in (z for y in q for z in y)]
Out[86]: ['target1/d/b/c', 'targetn/b/b/f']
Run Code Online (Sandbox Code Playgroud)

或使用any():

In [87]: [j for i in p for j in i if any (j==z for y in q for z in y)]
Out[87]: ['target1/d/b/c', 'targetn/b/b/f']
Run Code Online (Sandbox Code Playgroud)

时间:

In [93]: %timeit set(chain(*p)).intersection(set(chain(*q)))
100000 loops, best of 3: 7.38 us per loop                     ##  winner

In [94]: %timeit [j for i in p for j in i if j in (z for y in q for z in y)]
10000 loops, best of 3: 24.9 us per loop

In [95]: %timeit [j for i in p for j in i if any (j==z for y in q for z in y)]
10000 loops, best of 3: 27.4 us per loop

In [97]: %timeit [x for x in chain(*p) if x in chain(*q)]
10000 loops, best of 3: 12.6 us per loop
Run Code Online (Sandbox Code Playgroud)