随机删除 numpy 数组中 30% 的值

kon*_*tin 1 python arrays numpy

我有一个包含我的值的 2D numpy 数组(其中一些可以是 NaN)。我想删除 30% 的非 NaN 值并将它们替换为数组的平均值。我怎么能这样做?到目前为止我尝试过的:

def spar_removal(array, mean_value, sparseness):
    array1 = deepcopy(array)
    array2 = array1
    spar_size = int(round(array2.shape[0]*array2.shape[1]*sparseness))
    for i in range (0, spar_size):
        index = np.random.choice(np.where(array2 != mean_value)[1])
        array2[0, index] = mean_value
    return array2
Run Code Online (Sandbox Code Playgroud)

但这只是选择数组的同一行。如何从整个阵列中删除?似乎选择只适用于一维。我想我想要的是计算(x, y)我将用mean_value.

jed*_*rds 5

可能有更好的方法,但请考虑:

import numpy as np

x = np.array([[1,2,3,4],
              [1,2,3,4],
              [np.NaN, np.NaN, np.NaN, np.NaN],
              [1,2,3,4]])

# Get a vector of 1-d indexed indexes of non NaN elements
indices = np.where(np.isfinite(x).ravel())[0]

# Shuffle the indices, select the first 30% (rounded down with int())
to_replace = np.random.permutation(indices)[:int(indices.size * 0.3)]

# Replace those indices with the mean (ignoring NaNs)
x[np.unravel_index(to_replace, x.shape)] = np.nanmean(x)

print(x)
Run Code Online (Sandbox Code Playgroud)

示例输出

[[ 2.5 2. 2.5 4. ]
 [ 1. 2. 3. 4. ]
 [楠楠楠楠]
 [ 2.5 2. 3. 4. ]]

NaN 永远不会改变,并且 floor(0.3 * 非 NaN 元素的数量) 将设置为平均值(忽略 NaN 的平均值)。