检查 numpy 数组窗口中的元素是否有限的更快方法

sla*_*law 5 python arrays performance numpy

我有一个很长的包含1_000_000_000元素的 NumPy 数组,我想50在数组上滑动一个元素窗口,并询问窗口内的所有元素是否都是有限的。如果元素窗口内的所有元素50都是有限的,则返回True(对于该窗口),否则,如果50元素窗口内的一个或多个元素不是有限的,则返回False(对于该窗口)。继续此评估,直到评估完所有窗口。一个很好的方法是:

import numpy as np

def rolling_window(a, window):
    a = np.asarray(a)
    shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
    strides = a.strides + (a.strides[-1],)

    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)

if __name__ == "__main__":
    a = np.random.rand(100_000_000)  # This is 10x shorter than my real data
    w = 50
    idx = np.random.randint(0, len(a), size=len(a)//10)  # Simulate having np.nan in my array
    a[idx] = np.nan
    print(np.all(rolling_window(np.isfinite(a), w), axis=1))
Run Code Online (Sandbox Code Playgroud)

但是,当我的数组长度为 length 时,这很慢1_000_000_000。有没有一种更快的方法来完成此任务,并且不需要大量内存?

Div*_*kar 1

方法#1:滥用跨步窗口直接进入isfinite-mask分配 -

\n
def strided_allfinite(a, w):\n    m = np.isfinite(a)\n    p = rolling_window(m, w)\n    nmW = ~m[:w]\n    if nmW.any():\n        m[:np.flatnonzero(nmW).max()] = False\n    p[~m[w-1:]] = False\n    return m[:-w+1]\n
Run Code Online (Sandbox Code Playgroud)\n

给定样本数据的计时:

\n
In [323]: N = 100_000_000\n     ...: w = 50\n     ...: \n     ...: np.random.seed(0)\n     ...: a = np.random.rand(N)  # This is 10x shorter than my real data\n     ...: idx = np.random.randint(0, len(a), size=len(a)//10)  # Simulate...\n     ...: a[idx] = np.nan\n\n# Original soln\nIn [324]: %timeit np.all(rolling_window(np.isfinite(a), w), axis=1)\n1.61 s \xc2\xb1 14.5 ms per loop (mean \xc2\xb1 std. dev. of 7 runs, 1 loop each)\n\nIn [325]: %timeit strided_allfinite(a, w)\n556 ms \xc2\xb1 87.9 ms per loop (mean \xc2\xb1 std. dev. of 7 runs, 1 loop each)\n
Run Code Online (Sandbox Code Playgroud)\n

方法#2

\n

我们可以利用convolution——

\n
np.convolve(np.isfinite(a), np.ones(w),\'valid\')==w\n
Run Code Online (Sandbox Code Playgroud)\n

方法#3

\n

binary-erosion-

\n
from scipy.ndimage.morphology import binary_erosion\n\nm = np.isfinite(a)\nout = binary_erosion(m, np.ones(w, dtype=bool))[w//2:len(a)-w+1+w//2]\n
Run Code Online (Sandbox Code Playgroud)\n