ora*_*nge 13 python arrays numpy
我想选择数组的某些元素,并根据值执行加权平均计算.但是,使用过滤条件会破坏数组的原始结构.arr
它的形状(2, 2, 3, 2)
变成了一维的阵列.这对我来说毫无用处,因为并非所有这些元素都需要在以后相互组合(但是它们的子阵列).我怎样才能避免这种扁平化呢?
>>> arr = np.asarray([ [[[1, 11], [2, 22], [3, 33]], [[4, 44], [5, 55], [6, 66]]], [ [[7, 77], [8, 88], [9, 99]], [[0, 32], [1, 33], [2, 34] ]] ])
>>> arr
array([[[[ 1, 11],
[ 2, 22],
[ 3, 33]],
[[ 4, 44],
[ 5, 55],
[ 6, 66]]],
[[[ 7, 77],
[ 8, 88],
[ 9, 99]],
[[ 0, 32],
[ 1, 33],
[ 2, 34]]]])
>>> arr.shape
(2, 2, 3, 2)
>>> arr[arr>3]
array([11, 22, 33, 4, 44, 5, 55, 6, 66, 7, 77, 8, 88, 9, 99, 32, 33,
34])
>>> arr[arr>3].shape
(18,)
Run Code Online (Sandbox Code Playgroud)
Ale*_*lex 14
查看 numpy.where
http://docs.scipy.org/doc/numpy/reference/generated/numpy.where.html
要保持相同的维度,您需要填充值.在下面的示例中,我使用0,但您也可以使用np.nan
np.where(arr>3, arr, 0)
Run Code Online (Sandbox Code Playgroud)
回报
array([[[[ 0, 11],
[ 0, 22],
[ 0, 33]],
[[ 4, 44],
[ 5, 55],
[ 6, 66]]],
[[[ 7, 77],
[ 8, 88],
[ 9, 99]],
[[ 0, 32],
[ 0, 33],
[ 0, 34]]]])
Run Code Online (Sandbox Code Playgroud)
您可以考虑使用a np.ma.masked_array
来表示满足条件的元素子集:
import numpy as np
arr = np.asarray([[[[1, 11], [2, 22], [3, 33]],
[[4, 44], [5, 55], [6, 66]]],
[[[7, 77], [8, 88], [9, 99]],
[[0, 32], [1, 33], [2, 34]]]])
masked_arr = np.ma.masked_less(arr, 3)
print(masked_arr)
# [[[[-- 11]
# [-- 22]
# [3 33]]
# [[4 44]
# [5 55]
# [6 66]]]
# [[[7 77]
# [8 88]
# [9 99]]
# [[-- 32]
# [-- 33]
# [-- 34]]]]
Run Code Online (Sandbox Code Playgroud)
如您所见,蒙版数组保留其原始尺寸.您可以分别通过.data
和.mask
属性访问基础数据和掩码.大多数numpy函数都不会考虑屏蔽值,例如:
# mean of whole array
print(arr.mean())
# 26.75
# mean of non-masked elements only
print(masked_arr.mean())
# 33.4736842105
Run Code Online (Sandbox Code Playgroud)
对掩码数组和非掩码数组进行逐元素操作的结果也将保留掩码的值:
masked_arrsum = masked_arr + np.random.randn(*arr.shape)
print(masked_arrsum)
# [[[[-- 11.359989067421582]
# [-- 23.249092437269162]
# [3.326111354088174 32.679132708120726]]
# [[4.289134334263137 43.38559221094378]
# [6.028063054523145 53.5043991898567]
# [7.44695154979811 65.56890530368757]]]
# [[[8.45692625294376 77.36860675985407]
# [5.915835159196378 87.28574554110307]
# [8.251106168209688 98.7621940026713]]
# [[-- 33.24398289945855]
# [-- 33.411941757624284]
# [-- 34.964817895873715]]]]
Run Code Online (Sandbox Code Playgroud)
总和仅计算在非屏蔽值上masked_arr
- 您可以通过查看以下内容来查看masked_sum.data
:
print(masked_sum.data)
# [[[[ 1. 11.35998907]
# [ 2. 23.24909244]
# [ 3.32611135 32.67913271]]
# [[ 4.28913433 43.38559221]
# [ 6.02806305 53.50439919]
# [ 7.44695155 65.5689053 ]]]
# [[[ 8.45692625 77.36860676]
# [ 5.91583516 87.28574554]
# [ 8.25110617 98.762194 ]]
# [[ 0. 33.2439829 ]
# [ 1. 33.41194176]
# [ 2. 34.9648179 ]]]]
Run Code Online (Sandbox Code Playgroud)
看看arr>3
:
In [71]: arr>3
Out[71]:
array([[[[False, True],
[False, True],
[False, True]],
[[ True, True],
[ True, True],
[ True, True]]],
[[[ True, True],
[ True, True],
[ True, True]],
[[False, True],
[False, True],
[False, True]]]], dtype=bool)
Run Code Online (Sandbox Code Playgroud)
arr[arr>3]
选择掩码所在的元素True
。您希望该选择具有什么样的结构或形状?平面是唯一有意义的东西,不是吗? arr
本身没有改变。
你可以将不适合掩码的项归零,
In [84]: arr1=arr.copy()
In [85]: arr1[arr<=3]=0
In [86]: arr1
Out[86]:
array([[[[ 0, 11],
[ 0, 22],
[ 0, 33]],
[[ 4, 44],
[ 5, 55],
[ 6, 66]]],
[[[ 7, 77],
[ 8, 88],
[ 9, 99]],
[[ 0, 32],
[ 0, 33],
[ 0, 34]]]])
Run Code Online (Sandbox Code Playgroud)
现在,您可以对各个维度进行权重求和或求平均值。
np.nonzero
(或np.where
) 也可能有用,为您提供所选术语的索引:
In [88]: np.nonzero(arr>3)
Out[88]:
(array([0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1]),
array([0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1]),
array([0, 1, 2, 0, 0, 1, 1, 2, 2, 0, 0, 1, 1, 2, 2, 0, 1, 2]),
array([1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1]))
Run Code Online (Sandbox Code Playgroud)