numpy:将由nans分隔的1D数组块分成块的列表

ron*_*zon 6 python numpy

我有一个numpy数组,只有一些值是有效的,其余的是nan.例:

[nan,nan, 1 , 2 , 3 , nan, nan, 10, 11 , nan, nan, nan, 23, 1, nan, 7, 8]
Run Code Online (Sandbox Code Playgroud)

我想将其拆分为每次包含有效数据的块列表.结果将是

[[1,2,3], [10,11], [23,1], [7,8]]
Run Code Online (Sandbox Code Playgroud)

我设法通过迭代数组,检查isfinite()和生成(开始,停止)索引来完成它.

但是......这很痛苦......

你或许有更好的主意吗?

unu*_*tbu 8

这是另一种可能性:

import numpy as np
nan = np.nan

def using_clump(a):
    return [a[s] for s in np.ma.clump_unmasked(np.ma.masked_invalid(a))]

x = [nan,nan, 1 , 2 , 3 , nan, nan, 10, 11 , nan, nan, nan, 23, 1, nan, 7, 8]

In [56]: using_clump(x)
Out[56]: 
[array([ 1.,  2.,  3.]),
 array([ 10.,  11.]),
 array([ 23.,   1.]),
 array([ 7.,  8.])]
Run Code Online (Sandbox Code Playgroud)

比较using_clump和using_groupby的一些基准测试:

import itertools as IT
groupby = IT.groupby
def using_groupby(a):
    return [list(v) for k,v in groupby(a,np.isfinite) if k]
Run Code Online (Sandbox Code Playgroud)
In [58]: %timeit using_clump(x)
10000 loops, best of 3: 37.3 us per loop

In [59]: %timeit using_groupby(x)
10000 loops, best of 3: 53.1 us per loop
Run Code Online (Sandbox Code Playgroud)

对于更大的阵列,性能更好:

In [9]: x = x*1000
In [12]: %timeit using_clump(x)
100 loops, best of 3: 5.69 ms per loop

In [13]: %timeit using_groupby(x)
10 loops, best of 3: 60 ms per loop
Run Code Online (Sandbox Code Playgroud)