在 NumPy 数组中查找到最近零的距离

sla*_*law 13 python numpy

假设我有一个 NumPy 数组:

x = np.array([0, 1, 2, 0, 4, 5, 6, 7, 0, 0])
Run Code Online (Sandbox Code Playgroud)

在每个索引处,我想找到到最近的零值的距离。如果位置本身为零,则返回零作为距离。之后,我们只对与当前位置右侧的最近零的距离感兴趣。超级幼稚的方法是这样的:

out = np.full(x.shape[0], x.shape[0]-1)
for i in range(x.shape[0]):
    j = 0
    while i + j < x.shape[0]:
        if x[i+j] == 0:
            break
        j += 1
    out[i] = j
Run Code Online (Sandbox Code Playgroud)

输出将是:

array([0, 2, 1, 0, 4, 3, 2, 1, 0, 0])
Run Code Online (Sandbox Code Playgroud)

我注意到输出中零之间的倒计时/递减模式。所以,我也许可以使用零的位置(即,zero_indices = np.argwhere(x == 0).flatten()

在线性时间内获得所需输出的最快方法是什么?

Div*_*kar 10

方法#1: Searchsorted以矢量化方式拯救线性时间(在 numba 人进来之前)!

mask_z = x==0
idx_z = np.flatnonzero(mask_z)
idx_nz = np.flatnonzero(~mask_z)

# Cover for the case when there's no 0 left to the right
# (for same results as with posted loop-based solution)
if x[-1]!=0:
    idx_z = np.r_[idx_z,len(x)]

out = np.zeros(len(x), dtype=int)
idx = np.searchsorted(idx_z, idx_nz)
out[~mask_z] = idx_z[idx] - idx_nz
Run Code Online (Sandbox Code Playgroud)

方法#2:另一种cumsum-

mask_z = x==0
idx_z = np.flatnonzero(mask_z)

# Cover for the case when there's no 0 left to the right
if x[-1]!=0:
    idx_z = np.r_[idx_z,len(x)]

out = idx_z[np.r_[False,mask_z[:-1]].cumsum()] - np.arange(len(x))
Run Code Online (Sandbox Code Playgroud)

或者,最后一步cumsum可以替换为repeat功能 -

r = np.r_[idx_z[0]+1,np.diff(idx_z)]
out = np.repeat(idx_z,r)[:len(x)] - np.arange(len(x))
Run Code Online (Sandbox Code Playgroud)

方法 #3:另一个主要是cumsum-

mask_z = x==0
idx_z = np.flatnonzero(mask_z)

pp = np.full(len(x), -1)
pp[idx_z[:-1]] = np.diff(idx_z) - 1
if idx_z[0]==0:
    pp[0] = idx_z[1]
else:
    pp[0] = idx_z[0]
out = pp.cumsum()

# Handle boundary case and assigns 0s at original 0s places
out[idx_z[-1]:] = np.arange(len(x)-idx_z[-1],0,-1)
out[mask_z] = 0
Run Code Online (Sandbox Code Playgroud)