快速查找数组中重复元素的长度和起始索引的方法

Har*_*dKo 11 python numpy

我有一个数组 A:

import numpy as np
A = np.array( [0, 0, 1, 1, 1, 0, 1, 1, 0 ,0, 1, 0] )
Run Code Online (Sandbox Code Playgroud)

连续“1”的长度为:

output: [3, 2, 1]
Run Code Online (Sandbox Code Playgroud)

具有相应的起始索引:

idx = [2, 6, 10]
Run Code Online (Sandbox Code Playgroud)

原始数组很大,我更喜欢使用较少 for 循环的解决方案。

编辑(运行时):

import numpy as np
import time

A = np.array( [0, 0, 1, 1, 1, 0, 1, 1, 0 ,0, 1, 0] )

def LoopVersion(A):
    l_A = len(A)
    size = []
    idx = []
    temp_idx = []
    temp_size = []
    for i in range(l_A):
        if A[i] == 1:
            temp_size.append(1)
            if not temp_idx:
                temp_idx = i
                idx.append(temp_idx)
        else:
            size.append( len(temp_size) )
            size = [i for i in size if i != 0]
            temp_size = []
            temp_idx = []
    return size, idx
Run Code Online (Sandbox Code Playgroud)

Quang的解决方案:

def UniqueVersion(A):
    _, idx, counts = np.unique(np.cumsum(1-A)*A, return_index=True, return_counts=True)
    return idx, counts
Run Code Online (Sandbox Code Playgroud)

雅可的解决方案:

def ConcatVersion(A):
    A = np.concatenate(([0], A, [0]))  #  get rid of some edge cases
    starts = np.argwhere((A[:-1] + A[1:]) == 1).ravel()[::2]
    ends = np.argwhere((A[:-1] + A[1:]) == 1).ravel()[1::2]
    len_of_repeats = ends - starts
    return starts, len_of_repeats
Run Code Online (Sandbox Code Playgroud)

Dan 的解决方案(也适用于特殊情况):

def structure(A):
    ZA = np.concatenate(([0], A, [0]))
    indices = np.flatnonzero( ZA[1:] != ZA[:-1] )
    counts = indices[1:] - indices[:-1]
    return indices[::2], counts[::2]
Run Code Online (Sandbox Code Playgroud)

10000 个元素的运行时分析:

np.random.seed(1234)
B = np.random.randint(2, size=10000)


start = time.time()
size, idx = LoopVersion(B)
end = time.time()
print ( (end - start) )
# 0.32489800453186035 seconds

start = time.time()
idx, counts = UniqueVersion(B)
end = time.time()
print ( (end - start) )
# 0.008305072784423828 seconds

start = time.time()
idx, counts = ConcatVersion(B)
end = time.time()
print ( (end - start) )
# 0.0009801387786865234 seconds

start = time.time()
idx, counts = structure(B)
end = time.time()
print ( (end - start) )
# 0.000347137451171875 seconds
Run Code Online (Sandbox Code Playgroud)

Qua*_*ang 5

让我们试试unique

_, idx, counts = np.unique(np.cumsum(1-A)*A, return_index=True, return_counts=True)

# your expected output:
idx, counts
Run Code Online (Sandbox Code Playgroud)

输出:

(array([ 2,  6, 10]), array([3, 2, 1]))
Run Code Online (Sandbox Code Playgroud)


小智 3

这里是一个行人的尝试,通过对问题进行编程来解决问题。

我们在 前面加上一个零A,得到一个向量ZA,然后通过比较移位版本和来检测1岛屿,以及0以交替方式出现的岛屿。(在构造的数组中,我们采用偶数位置,对应于 中的位置。)ZAZA[1:]ZA[-1]A

import numpy as np

def structure(A):
    ZA = np.concatenate(([0], A, [0]))
    indices = np.flatnonzero( ZA[1:] != ZA[:-1] )
    counts = indices[1:] - indices[:-1]
    return indices[::2], counts[::2]
Run Code Online (Sandbox Code Playgroud)

一些示例运行:

In [71]: structure(np.array( [0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0] ))
Out[71]: (array([ 2,  6, 10]), array([3, 2, 1]))

In [72]: structure(np.array( [1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1] ))
Out[72]: (array([ 0,  5,  9, 13, 15]), array([3, 3, 2, 1, 1]))

In [73]: structure(np.array( [1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0] ))
Out[73]: (array([0, 5, 9]), array([3, 3, 2]))

In [74]: structure(np.array( [1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1] ))
Out[74]: (array([ 0,  2,  5,  7, 11, 14]), array([1, 2, 1, 3, 2, 3]))
Run Code Online (Sandbox Code Playgroud)