快速查找数组中重复元素的长度和起始索引的方法

Question

快速查找数组中重复元素的长度和起始索引的方法

我有一个数组 A：

import numpy as np
A = np.array( [0, 0, 1, 1, 1, 0, 1, 1, 0 ,0, 1, 0] )

Run Code Online (Sandbox Code Playgroud)

连续“1”的长度为：

output: [3, 2, 1]

Run Code Online (Sandbox Code Playgroud)

具有相应的起始索引：

idx = [2, 6, 10]

Run Code Online (Sandbox Code Playgroud)

原始数组很大，我更喜欢使用较少 for 循环的解决方案。

编辑（运行时）：

import numpy as np
import time

A = np.array( [0, 0, 1, 1, 1, 0, 1, 1, 0 ,0, 1, 0] )

def LoopVersion(A):
    l_A = len(A)
    size = []
    idx = []
    temp_idx = []
    temp_size = []
    for i in range(l_A):
        if A[i] == 1:
            temp_size.append(1)
            if not temp_idx:
                temp_idx = i
                idx.append(temp_idx)
        else:
            size.append( len(temp_size) )
            size = [i for i in size if i != 0]
            temp_size = []
            temp_idx = []
    return size, idx

Run Code Online (Sandbox Code Playgroud)

Quang的解决方案：

def UniqueVersion(A):
    _, idx, counts = np.unique(np.cumsum(1-A)*A, return_index=True, return_counts=True)
    return idx, counts

Run Code Online (Sandbox Code Playgroud)

雅可的解决方案：

def ConcatVersion(A):
    A = np.concatenate(([0], A, [0]))  #  get rid of some edge cases
    starts = np.argwhere((A[:-1] + A[1:]) == 1).ravel()[::2]
    ends = np.argwhere((A[:-1] + A[1:]) == 1).ravel()[1::2]
    len_of_repeats = ends - starts
    return starts, len_of_repeats

Run Code Online (Sandbox Code Playgroud)

Dan 的解决方案（也适用于特殊情况）：

def structure(A):
    ZA = np.concatenate(([0], A, [0]))
    indices = np.flatnonzero( ZA[1:] != ZA[:-1] )
    counts = indices[1:] - indices[:-1]
    return indices[::2], counts[::2]

Run Code Online (Sandbox Code Playgroud)

10000 个元素的运行时分析：

np.random.seed(1234)
B = np.random.randint(2, size=10000)


start = time.time()
size, idx = LoopVersion(B)
end = time.time()
print ( (end - start) )
# 0.32489800453186035 seconds

start = time.time()
idx, counts = UniqueVersion(B)
end = time.time()
print ( (end - start) )
# 0.008305072784423828 seconds

start = time.time()
idx, counts = ConcatVersion(B)
end = time.time()
print ( (end - start) )
# 0.0009801387786865234 seconds

start = time.time()
idx, counts = structure(B)
end = time.time()
print ( (end - start) )
# 0.000347137451171875 seconds

Run Code Online (Sandbox Code Playgroud)

Answer 1

Qua*_*ang 5

让我们试试unique：

_, idx, counts = np.unique(np.cumsum(1-A)*A, return_index=True, return_counts=True)

# your expected output:
idx, counts

Run Code Online (Sandbox Code Playgroud)

输出：

(array([ 2,  6, 10]), array([3, 2, 1]))

Run Code Online (Sandbox Code Playgroud)

Answer 2

小智 3

这里是一个行人的尝试，通过对问题进行编程来解决问题。

我们在前面加上一个零A，得到一个向量ZA，然后通过比较移位版本和来检测1岛屿，以及0以交替方式出现的岛屿。（在构造的数组中，我们采用偶数位置，对应于中的位置。）ZAZA[1:]ZA[-1]A

import numpy as np

def structure(A):
    ZA = np.concatenate(([0], A, [0]))
    indices = np.flatnonzero( ZA[1:] != ZA[:-1] )
    counts = indices[1:] - indices[:-1]
    return indices[::2], counts[::2]

Run Code Online (Sandbox Code Playgroud)

一些示例运行：

In [71]: structure(np.array( [0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0] ))
Out[71]: (array([ 2,  6, 10]), array([3, 2, 1]))

In [72]: structure(np.array( [1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1] ))
Out[72]: (array([ 0,  5,  9, 13, 15]), array([3, 3, 2, 1, 1]))

In [73]: structure(np.array( [1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0] ))
Out[73]: (array([0, 5, 9]), array([3, 3, 2]))

In [74]: structure(np.array( [1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1] ))
Out[74]: (array([ 0,  2,  5,  7, 11, 14]), array([1, 2, 1, 3, 2, 3]))

Run Code Online (Sandbox Code Playgroud)

归档时间：	5 年，4 月前
查看次数：	232 次
最近记录：	5 年，4 月前