使用Python如何通过有序子集匹配减少列表列表[[..],[..],..]?
在此问题的上下文中,列表L是列表的子集,M如果M包含所有成员L,并且顺序相同.例如,列表[1,2]是列表[1,2,3]的子集,但不是列表[2,1,3]的子集.
输入示例:
a. [[1, 2, 4, 8], [1, 2, 4, 5, 6], [1, 2, 3], [2, 3, 21], [1, 2, 3, 4], [1, 2, 3, 4, 5, 6, 7]]
b. [[2, 16, 17], [1, 2, 3, 4, 5, 6, 7], [1], [1, 2, 3, 4], [1, 2], [17, 18, 19, 22, 41, 48], [2, 3], [1, 2, 3], [50, 69], [1, 2, 3], [2, 3, 21], [1, 2, 3], [1, 2, 4, 8], [1, 2, 4, 5, 6]]
Run Code Online (Sandbox Code Playgroud)
预期结果:
a. [[1, 2, 4, 8], [2, 3, 21], [1, 2, 3, 4, 5, 6, 7]]
b. [[2, 16, 17], [1, 2, 3, 4, 5, 6, 7], [17, 18, 19, 22, 41, 48], [50, 69], [2, 3, 21], [1, 2, 4, 8], [1, 2, 4, 5, 6]]
Run Code Online (Sandbox Code Playgroud)
更多例子:
L = [[1, 2, 3, 4, 5, 6, 7], [1, 2, 5, 6]] - 没有减少
L = [[1, 2, 3, 4, 5, 6, 7], ,[1, 2, 3][1, 2, 4, 8]]- 是减少
L = [[1, 2, 3, 4, 5, 6, 7], [7, 6, 5, 4, 3, 2, 1]] - 没有减少
(很抱歉导致与不正确的数据集混淆.)
这可以简化,但是:
l = [[1, 2, 4, 8], [1, 2, 4, 5, 6], [1, 2, 3], [2, 3, 21], [1, 2, 3, 4], [1, 2, 3, 4, 5, 6, 7]]
l2 = l[:]
for m in l:
for n in l:
if set(m).issubset(set(n)) and m != n:
l2.remove(m)
break
print l2
[[1, 2, 4, 8], [2, 3, 21], [1, 2, 3, 4, 5, 6, 7]]
Run Code Online (Sandbox Code Playgroud)
这段代码应该具有相当的内存效率。除了存储您的初始列表列表之外,此代码使用的额外内存可忽略不计(不会创建列表的临时集或副本)。
def is_subset(needle,haystack):
""" Check if needle is ordered subset of haystack in O(n) """
if len(haystack) < len(needle): return False
index = 0
for element in needle:
try:
index = haystack.index(element, index) + 1
except ValueError:
return False
else:
return True
def filter_subsets(lists):
""" Given list of lists, return new list of lists without subsets """
for needle in lists:
if not any(is_subset(needle, haystack) for haystack in lists
if needle is not haystack):
yield needle
my_lists = [[1, 2, 4, 8], [1, 2, 4, 5, 6], [1, 2, 3],
[2, 3, 21], [1, 2, 3, 4], [1, 2, 3, 4, 5, 6, 7]]
print list(filter_subsets(my_lists))
>>> [[1, 2, 4, 8], [2, 3, 21], [1, 2, 3, 4, 5, 6, 7]]
Run Code Online (Sandbox Code Playgroud)
而且,只是为了好玩,一个单线:
def filter_list(L):
return [x for x in L if not any(set(x)<=set(y) for y in L if x is not y)]
Run Code Online (Sandbox Code Playgroud)
感谢所有提出解决方案并处理我有时错误的数据集的人。使用@hughdbrown解决方案我将其修改为我想要的:
修改是在目标上使用滑动窗口以确保找到子集序列。我认为我应该使用比“设置”更合适的词来描述我的问题。
def is_sublist_of_any_list(cand, lists):
# Compare candidate to a single list
def is_sublist_of_list(cand, target):
try:
i = 0
try:
start = target.index(cand[0])
except:
return False
while start < (len(target) + len(cand)) - start:
if cand == target[start:len(cand)]:
return True
else:
start = target.index(cand[0], start + 1)
except ValueError:
return False
# See if candidate matches any other list
return any(is_sublist_of_list(cand, target) for target in lists if len(cand) <= len(target))
# Compare candidates to all other lists
def super_lists(lists):
a = [cand for i, cand in enumerate(lists) if not is_sublist_of_any_list(cand, lists[:i] + lists[i+1:])]
return a
lists = [[2, 16, 17], [1, 2, 3, 4, 5, 6, 7], [1], [1, 2, 3, 4], [1, 2], [17, 18, 19, 22, 41, 48], [2, 3], [1, 2, 3], [50, 69], [1, 2, 3], [2, 3, 21], [1, 2, 3], [1, 2, 4, 8], [1, 2, 4, 5, 6]]
expect = [[2, 16, 17], [1, 2, 3, 4, 5, 6, 7], [17, 18, 19, 22, 41, 48], [50, 69], [2, 3, 21], [1, 2, 4, 8], [1, 2, 4, 5, 6]]
def test():
out = super_lists(list(lists))
print "In : ", lists
print "Out : ", out
assert (out == expect)
Run Code Online (Sandbox Code Playgroud)
结果:
In : [[2, 16, 17], [1, 2, 3, 4, 5, 6, 7], [1], [1, 2, 3, 4], [1, 2], [17, 18, 19, 22, 41, 48], [2, 3], [1, 2, 3], [50, 69], [1, 2, 3], [2, 3, 21], [1, 2, 3], [1, 2, 4, 8], [1, 2, 4, 5, 6]]
Out : [[2, 16, 17], [1, 2, 3, 4, 5, 6, 7], [17, 18, 19, 22, 41, 48], [50, 69], [2, 3, 21], [1, 2, 4, 8], [1, 2, 4, 5, 6]]
Run Code Online (Sandbox Code Playgroud)