查找整数排序列表发生变化的索引

Question

查找整数排序列表发生变化的索引

Dan*_*ejo 6 python algorithm binary-search

假设整数的排序列表如下：

data = [1] * 3 + [4] * 5 + [5] * 2 + [9] * 3
# [1, 1, 1, 4, 4, 4, 4, 4, 5, 5, 9, 9, 9]

Run Code Online (Sandbox Code Playgroud)

我想找到值发生变化的索引，即

[3, 8, 10, 13]

Run Code Online (Sandbox Code Playgroud)

一种方法是使用itertools.groupby：

cursor = 0
result = []
for key, group in groupby(data):
    cursor += sum(1 for _ in group)
    result.append(cursor)
print(result)

Run Code Online (Sandbox Code Playgroud)

输出

[3, 8, 10, 13]

Run Code Online (Sandbox Code Playgroud)

这种方法的复杂度是 O(n)。另一种可能的方法是使用bisect.bisect_left：

cursor = 0
result = []
while cursor < len(data):
    cursor = bisect_left(data, data[cursor] + 1, cursor, len(data))
    result.append(cursor)
print(result)

Run Code Online (Sandbox Code Playgroud)

输出

[3, 8, 10, 13]

Run Code Online (Sandbox Code Playgroud)

这种方法的复杂度为 O(k*log n)，其中 k 是不同元素的数量。这种方法的一个变体是使用指数搜索。

有没有更快或更高效的方法来做到这一点？

Answer 1

tri*_*cot 5

当谈到渐近复杂度时，我认为当您应用更均匀分布的分而治之方法时，平均可以稍微改进二分搜索：尝试首先查明靠近输入列表中间发生的值变化，从而将范围分成大约两半，这会将下一个二分搜索操作路径减少大约一个。

然而，由于这是 Python，由于 Python 代码开销（例如 for yield、yield from、递归等），增益可能并不明显。对于您使用的列表大小，它甚至可能表现更差：

from bisect import bisect_left

def locate(data, start, end):
    if start >= end or data[start] == data[end - 1]:
        return
    mid = (start + end) // 2
    val = data[mid] 
    if val == data[start]:
        start = mid
        val += 1
    i = bisect_left(data, val, start + 1, end)
    yield from locate(data, start, i)
    yield i
    yield from locate(data, i, end)

data = [1] * 3 + [4] * 5 + [5] * 2 + [9] * 3
print(*locate(data, 0, len(data)))  # 3 8 10

Run Code Online (Sandbox Code Playgroud)

请注意，这仅输出有效索引，因此此示例输入不包括 13。

归档时间：	4 年，2 月前
查看次数：	517 次
最近记录：	4 年，2 月前