在忽略NaN的情况下采用np.average?

Chr*_*neB 7 python numpy latitude-longitude weighted-average

我有一个形状矩阵(64,17)对应时间和纬度.我想采用加权纬度平均值,我知道np.average可以做,因为,与我用来平均经度的np.nanmean不同,权重可以在参数中使用.但是,np.average不会像np.nanmean那样忽略NaN,所以我每行的前5个条目都包含在纬度平均值中,并使整个时间序列充满NaN.

有没有一种方法可以在没有将NaN包含在计算中的情况下采用加权平均值?

file = Dataset("sst_aso_1951-2014latlon_seasavgs.nc")
sst = file.variables['sst']
lat = file.variables['lat']

sst_filt = np.asarray(sst)
missing_values_indices = sst_filt < -8000000   #missing values have value -infinity
sst_filt[missing_values_indices] = np.nan      #all missing values set to NaN

weights = np.cos(np.deg2rad(lat))
sst_zonalavg = np.nanmean(sst_filt, axis=2)
print sst_zonalavg[0,:]
sst_ts = np.average(sst_zonalavg, axis=1, weights=weights)
print sst_ts[:]
Run Code Online (Sandbox Code Playgroud)

输出:

[ nan nan nan nan nan
 27.08499908 27.33333397 28.1457119 28.32899857 28.34454346
 28.27285767 28.18571472 28.10199928 28.10812378 28.03411865
 28.06411552 28.16529465]

[ nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
 nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
 nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
 nan nan nan nan]
Run Code Online (Sandbox Code Playgroud)

Ale*_*lex 11

您可以像这样创建一个蒙版数组:

data = np.array([[1,2,3], [4,5,np.NaN], [np.NaN,6,np.NaN], [0,0,0]])
masked_data = np.ma.masked_array(data, np.isnan(data))
# calculate your weighted average here instead
weights = [1, 1, 1]
average = np.ma.average(masked_data, axis=1, weights=weights)
# this gives you the result
result = average.filled(np.nan)
print(result)
Run Code Online (Sandbox Code Playgroud)

这输出:

[ 2.   4.5  6.   0. ]
Run Code Online (Sandbox Code Playgroud)

  • 我已经更新了答案,它现在应该可以正常工作了,您需要对屏蔽数组使用`np.ma.average`。请注意`.ma`。 (2认同)

Div*_*kar 7

可以简单乘法输入数组与weights和总和沿指定轴线忽略NaNsnp.nansum。因此,对于您的情况,假设weightsaxis = 1在输入数组上使用sst_filt,总和将是 -

np.nansum(sst_filt*weights,axis=1)
Run Code Online (Sandbox Code Playgroud)

在平均时考虑到 NaN,我们最终会得到:

def nanaverage(A,weights,axis):
    return np.nansum(A*weights,axis=axis)/((~np.isnan(A))*weights).sum(axis=axis)
Run Code Online (Sandbox Code Playgroud)

样品运行 -

In [200]: sst_filt  # 2D array case
Out[200]: 
array([[  0.,   1.],
       [ nan,   3.],
       [  4.,   5.]])

In [201]: weights
Out[201]: array([ 0.25,  0.75])

In [202]: nanaverage(sst_filt,weights=weights,axis=1)
Out[202]: array([0.75, 3.  , 4.75])
Run Code Online (Sandbox Code Playgroud)


det*_*eto 5

我可能只是选择数组中非 NaN 的部分,然后使用这些索引来选择权重。

例如:

import numpy as np
data = np.random.rand(10)
weights = np.random.rand(10)
data[[2, 4, 8]] = np.nan

print data
# [ 0.32849204,  0.90310062,         nan,  0.58580299,         nan,
#    0.934721  ,  0.44412978,  0.78804409,         nan,  0.24942098]

ii = ~np.isnan(data)
print ii
# [ True  True False  True False  True  True  True False  True]

result = np.average(data[ii], weights = weights[ii])
print result
# .6470319
Run Code Online (Sandbox Code Playgroud)

编辑:我意识到这不适用于二维数组。在这种情况下,我可能只是将 NaN 的值和权重设置为零。这会产生相同的结果,就好像这些指数未包含在计算中一样。

在运行 np.average 之前:

data[np.isnan(data)] = 0;
weights[np.isnan(data)] = 0;
result = np.average(data, weights=weights)
Run Code Online (Sandbox Code Playgroud)

或者,如果您想跟踪哪些索引为 NaN,则创建副本。