NumPy:计算NaNs去除的平均值

Mik*_*e T 39 python numpy nan

如何计算矩阵中的矩阵平均值,但是nan要从计算中删除值?(对R人来说,想想na.rm = TRUE).

这是我的[非]工作示例:

import numpy as np
dat = np.array([[1, 2, 3],
                [4, 5, np.nan],
                [np.nan, 6, np.nan],
                [np.nan, np.nan, np.nan]])
print(dat)
print(dat.mean(1))  # [  2.  nan  nan  nan]
Run Code Online (Sandbox Code Playgroud)

删除NaN后,我的预期输出为:

array([ 2.,  4.5,  6.,  nan])
Run Code Online (Sandbox Code Playgroud)

Jos*_*del 35

我想你想要的是一个蒙面数组:

dat = np.array([[1,2,3], [4,5,nan], [nan,6,nan], [nan,nan,nan]])
mdat = np.ma.masked_array(dat,np.isnan(dat))
mm = np.mean(mdat,axis=1)
print mm.filled(np.nan) # the desired answer
Run Code Online (Sandbox Code Playgroud)

编辑:组合所有计时数据

   from timeit import Timer

    setupstr="""
import numpy as np
from scipy.stats.stats import nanmean    
dat = np.random.normal(size=(1000,1000))
ii = np.ix_(np.random.randint(0,99,size=50),np.random.randint(0,99,size=50))
dat[ii] = np.nan
"""  

    method1="""
mdat = np.ma.masked_array(dat,np.isnan(dat))
mm = np.mean(mdat,axis=1)
mm.filled(np.nan)    
"""

    N = 2
    t1 = Timer(method1, setupstr).timeit(N)
    t2 = Timer("[np.mean([l for l in d if not np.isnan(l)]) for d in dat]", setupstr).timeit(N)
    t3 = Timer("np.array([r[np.isfinite(r)].mean() for r in dat])", setupstr).timeit(N)
    t4 = Timer("np.ma.masked_invalid(dat).mean(axis=1)", setupstr).timeit(N)
    t5 = Timer("nanmean(dat,axis=1)", setupstr).timeit(N)

    print 'Time: %f\tRatio: %f' % (t1,t1/t1 )
    print 'Time: %f\tRatio: %f' % (t2,t2/t1 )
    print 'Time: %f\tRatio: %f' % (t3,t3/t1 )
    print 'Time: %f\tRatio: %f' % (t4,t4/t1 )
    print 'Time: %f\tRatio: %f' % (t5,t5/t1 )
Run Code Online (Sandbox Code Playgroud)

返回:

Time: 0.045454  Ratio: 1.000000
Time: 8.179479  Ratio: 179.950595
Time: 0.060988  Ratio: 1.341755
Time: 0.070955  Ratio: 1.561029
Time: 0.065152  Ratio: 1.433364
Run Code Online (Sandbox Code Playgroud)

  • @mathtick此外,据我所知,scipy 0.10或0.11中没有`scipy.nanmean`方法.有`scipy.stats.stats.nanmean`和`scipy.stats.nanmean`,它们是等价的,我在上面测试过. (4认同)

dep*_*ted 18

如果性能很重要,您应该使用bottleneck.nanmean():

http://pypi.python.org/pypi/Bottleneck


小智 12

假设您还安装了SciPy:

http://www.scipy.org/doc/api_docs/SciPy.stats.stats.html#nanmean

  • 只是为了完整性,因为我已经计算了所有其他代码 - "stats.stats.nanmean"比`np.ma`解决方案慢1.5倍. (5认同)

Ale*_*der 10

从 numpy 1.8(2013-10-30 发布)开始,nanmean正是您所需要的:

>>> import numpy as np
>>> np.nanmean(np.array([1.5, 3.5, np.nan]))
2.5
Run Code Online (Sandbox Code Playgroud)


Sve*_*ach 8

过滤掉nans的蒙面数组也可以动态创建:

print np.ma.masked_invalid(dat).mean(1)
Run Code Online (Sandbox Code Playgroud)


Ben*_*min 8

您始终可以找到以下内容的解决方法:

numpy.nansum(dat, axis=1) / numpy.sum(numpy.isfinite(dat), axis=1)
Run Code Online (Sandbox Code Playgroud)

Numpy 2.0 numpy.mean有一个skipna选项可以解决这个问题.