在numpy数组中找到连续重复的nan

vol*_*olt 7 python arrays numpy

在numpy数组中找到连续重复nan的最大数量的最佳方法是什么?

例子:

from numpy import nan
Run Code Online (Sandbox Code Playgroud)

输入1: [nan, nan, nan, 0.16, 1, 0.16, 0.9999, 0.0001, 0.16, 0.101, nan, 0.16]

输出1: 3

输入2: [nan, nan, 2, 1, 1, nan, nan, nan, nan, 0.101, nan, 0.16]

输出2: 4

Div*_*kar 5

这是一种方法 -

def max_repeatedNaNs(a):
    # Mask of NaNs
    mask = np.concatenate(([False],np.isnan(a),[False]))
    if ~mask.any():
        return 0
    else:
        # Count of NaNs in each NaN group. Then, get max count as o/p.
        c = np.flatnonzero(mask[1:] < mask[:-1]) - \
            np.flatnonzero(mask[1:] > mask[:-1])
        return c.max()
Run Code Online (Sandbox Code Playgroud)

这是一个改进版本 -

def max_repeatedNaNs_v2(a):
    mask = np.concatenate(([False],np.isnan(a),[False]))
    if ~mask.any():
        return 0
    else:
        idx = np.nonzero(mask[1:] != mask[:-1])[0]
        return (idx[1::2] - idx[::2]).max()
Run Code Online (Sandbox Code Playgroud)

针对以下方面的基准测试@pltrdy's comment-

In [77]: a = np.random.rand(10000)

In [78]: a[np.random.choice(range(len(a)),size=1000,replace=0)] = np.nan

In [79]: %timeit contiguous_NaN(a) #@pltrdy's solution
100 loops, best of 3: 15.8 ms per loop

In [80]: %timeit max_repeatedNaNs(a)
10000 loops, best of 3: 103 µs per loop

In [81]: %timeit max_repeatedNaNs_v2(a)
10000 loops, best of 3: 86.4 µs per loop
Run Code Online (Sandbox Code Playgroud)

  • @pltrdy感谢您对该downvote发表评论.此解决方案旨在提高性能.会添加一些运行时测试来证明这一点. (4认同)
  • @Tagc MSeifert接受了繁重的工作并在[他的帖子]中发布了时间安排(http://stackoverflow.com/a/41722059/3293881).看看那些! (2认同)