检查列表是否是有效的块序列

Question

检查列表是否是有效的块序列

don*_*ode 8 python validation list sequence chunks

我想检查列表是否是有效的块序列，其中每个块以某个值开始，以下一次出现的相同值结束。例如，这是三个块的有效序列：

lst = [2, 7, 1, 8, 2, 8, 1, 8, 2, 8, 4, 5, 9, 0, 4, 5, 2]
       \___________/  \_____/  \_______________________/

Run Code Online (Sandbox Code Playgroud)

这是无效的：

lst = [2, 7, 1, 8, 2, 8, 1, 8, 2, 8, 4, 5, 9, 0, 4]
       \___________/  \_____/  \_____ ... missing the 2 to end the chunk

Run Code Online (Sandbox Code Playgroud)

我有一个解决方案，但它很糟糕。你看到更好的东西了吗？

def is_valid(lst):
    while lst:
        start = lst.pop(0)
        if start not in lst:
            return False
        while lst[0] != start:
            lst.pop(0)
        lst.remove(start)
    return True

# Tests, should print: True, False, True, False, True
print(is_valid([2, 7, 1, 8, 2, 8, 1, 8, 2, 8, 4, 5, 9, 0, 4, 5, 2]))
print(is_valid([2, 7, 1, 8, 2, 8, 1, 8, 2, 8, 4, 5, 9, 0, 4]))
print(is_valid(['I', 'N', 'O', 'A', 'I', 'L', 'L', 'T', 'R', 'X', 'I', 'I', 'N', 'X', 'F', 'T']))
print(is_valid(['T', 'I', 'N', 'I', 'X', 'R', 'O', 'F', 'T', 'I', 'N', 'I', 'X', 'L', 'L', 'A']))
print(is_valid([]))

Run Code Online (Sandbox Code Playgroud)

Answer 1

tob*_*s_k 10

怎么样，iter从列表中创建一个并在该迭代器上向前搜索，直到next找到匹配的元素。请注意，这可能会失败，因为它None可以是列表的元素；那么你应该定义并与哨兵进行比较obj = object()。

def is_valid(lst):
    it = iter(lst)
    for x in it:
        if next((y for y in it if y == x), None) is None:
            return False
    return True

Run Code Online (Sandbox Code Playgroud)

由于我们实际上并不需要的返回值next，所以我们也可以直接使用any来代替，同时解决了default元素的问题。与一样next，any将消耗迭代器直至匹配元素（如果有）：

def is_valid(lst):
    it = iter(lst)
    for x in it:
        if not any(y == x for y in it):
            return False
    return True

Run Code Online (Sandbox Code Playgroud)

可以使用all而不是外for循环进一步缩短：

def is_valid(lst):
    it = iter(lst)
    return all(any(y == x for y in it) for x in it)

Run Code Online (Sandbox Code Playgroud)

这最终可以简化为同样神秘和有趣的：

def is_valid(lst):
    it = iter(lst)
    return all(x in it for x in it)

Run Code Online (Sandbox Code Playgroud)

每种方式，每个元素都被访问一次，原始列表没有改变，几乎没有额外的空间，恕我直言，它甚至有点容易阅读和理解。

这从来都与速度无关，但无论如何：这里是不同解决方案（以及更多变体）的一些基准，运行问题中的测试用例以及两个包含 1,000 个整数的随机列表，一个有效，一个无效，10,000 次，在 Python 3.8.10 上：

# with long lists             # only short test lists
1.52 is_valid_index           0.22 is_valid_index
3.28 is_valid_next            0.30 is_valid_next
2.78 is_valid_for_for_else    0.13 is_valid_for_for_else
5.26 is_valid_for_any         0.32 is_valid_for_any
5.29 is_valid_all_any         0.38 is_valid_all_any
3.42 is_valid_all_any_if      0.36 is_valid_all_any_if
2.02 is_valid_all_in          0.18 is_valid_all_in
1.97 is_valid_all_in_if       0.17 is_valid_all_in_if
1.87 is_valid_for_in          0.11 is_valid_for_in

Run Code Online (Sandbox Code Playgroud)

当然，都是O(n)。对于 1000 个元素的长列表，使用的解决方案index是最快的，但使用的解决方案x in it也不错。这些解决方案有些落后，但与使用带有条件的生成器any时一样快（或慢），但仍然比使用普通循环时慢。由于只有简短的测试列表，情况有点不同：这里，使用一个迭代器和的解决方案速度最快，并且速度快了很多。nextforfor-for-elsefor-in

Answer 2

L3v*_*han 5

这是我对这个问题的看法。我针对可读性而非速度进行了优化（当然，同时保持 O(n) 的复杂度）：

def is_valid(sequence):
    iterator = iter(sequence)
    for element in iterator:
        for other in iterator:
            if element == other:
                break
        else:
            return False
    return True

Run Code Online (Sandbox Code Playgroud)

外循环的每次迭代对应一个块。当我们没有这里的元素时，我们在块边界处结束序列，我们可以return True。否则，我们循环遍历迭代器，直到找到匹配的元素。如果我们用完元素（一个“自然”终止的 for 循环，没有break，进入它的else）我们return False。

这是另一个使用itertools. 我不喜欢上面的解决方案，主要是因为next与哨兵的神秘使用：

from itertools import dropwhile

def is_valid(iterable):
    iterator = iter(iterable)
    sentinel = object()
    for element in iterator:
        if next(dropwhile(lambda x: x != element, iterator), sentinel) is sentinel:
            return False
    return True

Run Code Online (Sandbox Code Playgroud)

归档时间：	4 年，5 月前
查看次数：	844 次
最近记录：	4 年，5 月前