根据条件过滤元组列表

Question

根据条件过滤元组列表

For a given list of tuples, if multiple tuples in the list have the first element of tuple the same - among them select only the tuple with the maximum last element.

For example:

sample_list = [(5,16,2),(5,10,3),(5,8,1),(21,24,1)]

Run Code Online (Sandbox Code Playgroud)

在sample_list上面，因为5在这种情况下前 3 个元组具有相似的第一个元素，其中只有第二个元组应该保留，因为它具有最大的最后一个元素 => 3。

预期操作：

op = [(5,10,3),(21,24,1)]

Run Code Online (Sandbox Code Playgroud)

代码：

op = []
for m in range(len(sample_list)):
    li = [sample_list[m]]
    for n in range(len(sample_list)):
        if(sample_list[m][0] == sample_list[n][0]
           and sample_list[m][2] != sample_list[n][2]):
            li.append(sample_list[n])
    op.append(sorted(li,key=lambda dd:dd[2],reverse=True)[0])

print (list(set(op)))

Run Code Online (Sandbox Code Playgroud)

这有效。但是对于长列表来说非常慢。有没有更pythonic或更有效的方法来做到这一点？

Answer 1

sch*_*ggl 16

使用itertools.groupby和operator.itemgetter来提高可读性。在组内，应用max适当的键功能，itemgetter为简洁起见再次使用：

from itertools import groupby
from operator import itemgetter as ig

lst = [(5, 10, 3), (21, 24, 1), (5, 8, 1), (5, 16, 2)]

[max(g, key=ig(-1)) for _, g in groupby(sorted(lst), key=ig(0))]
# [(5, 10, 3), (21, 24, 1)]

Run Code Online (Sandbox Code Playgroud)

对于线性时间解决方案，额外空间仅限制唯一第一个元素的数量，您可以使用dict：

d = {}
for tpl in lst:
    first, *_, last = tpl
    if first not in d or last > d[first][-1]:
        d[first] = tpl

[*d.values()]
# [(5, 10, 3), (21, 24, 1)]

Run Code Online (Sandbox Code Playgroud)

公平地说，一开始使用了一个，但结果证明没有必要。 (2认同)
第二个选项很简单，并且性能尽可能高。 (2认同)

Answer 2

Dan*_*ejo 13

您可以使用defaultdict对具有相同第一个元素的元组进行分组，然后根据第三个元素取每个组的最大值：

from collections import defaultdict

sample_list = [(5,16,2),(5,10,3),(5,8,1),(21,24,1)]

d = defaultdict(list)
for e in sample_list:
    d[e[0]].append(e)

res = [max(val, key=lambda x: x[2]) for val in d.values()]
print(res)

Run Code Online (Sandbox Code Playgroud)

输出

[(5, 10, 3), (21, 24, 1)]

Run Code Online (Sandbox Code Playgroud)

这种做法是O(n)。

Answer 3

U10*_*ard 5

尝试itertools.groupby：

from itertools import groupby
sample_list.sort()
print([max(l, key=lambda x: x[-1]) for _, l in groupby(sample_list, key=lambda x: x[0])])

Run Code Online (Sandbox Code Playgroud)

或者也可以使用operator.itemgetter：

from itertools import groupby
from operator import itemgetter
sample_list.sort()
print([max(l, key=itemgetter(-1)) for _, l in groupby(sample_list, key=itemgetter(0))])

Run Code Online (Sandbox Code Playgroud)

对于性能尝试：

from operator import itemgetter
dct = {}
for i in sample_list:
    if i[0] in dct:
        dct[i[0]].append(i)
    else:
        dct[i[0]] = [i]
print([max(v, key=itemgetter(-1)) for v in dct.values()])

Run Code Online (Sandbox Code Playgroud)

所有输出：

[(5, 10, 3), (21, 24, 1)]

Run Code Online (Sandbox Code Playgroud)

归档时间：	4 年，1 月前
查看次数：	424 次
最近记录：	4 年，1 月前