小编Leo*_*Leo的帖子

标准化期间屏蔽 0 值

我正在对数据集进行标准化，但由于填充，数据包含很多 0。

我可以在模型训练期间屏蔽它们，但显然，当我应用归一化时，这些零会受到影响。

from sklearn.preprocessing import StandardScaler,MinMaxScaler

我目前正在使用 Sklearn 库进行规范化

例如，给定一个维度为 (4,3,5) 的 3D 数组为 (batch, step, features)

零填充的数量因批次而异，因为这些是我从音频文件中提取的特征，这些文件具有不同的长度，使用固定的窗口大小。

[[[0 0 0 0 0],
  [0 0 0 0 0],
  [0 0 0 0 0]]

 [[1 2 3 4 5],
  [4 5 6 7 8],
  [9 10 11 12 13]],

 [[14 15 16 17 18],
  [0 0 0 0 0],
  [24 25 26 27 28]],

 [[0 0 0 0 0],
  [423 2 230 60 70],
  [0 0 0 0 0]]
] …

Run Code Online (Sandbox Code Playgroud)

python machine-learning scikit-learn

Leo*_*Leo

2020 11-03

6
推荐指数

1
解决办法

661
查看次数

如何标准化 3D 数组的特定维度

sklearn.preprocessing.normalize仅支持二维数组标准化。然而，我目前有一个用于 LSTM 模型训练的 3D 数组（批量、步骤、特征），我希望对特征进行标准化。

我已经尝试过tf.keras.utils.normalize(X_train, axis=-1, order=2 ) 但它不正确。

另一种方法是将 3D 数组折叠成 2D 数组

print(X_train.shape)
print(max(X_train[0][0]))

Run Code Online (Sandbox Code Playgroud)

输出

(1883, 100, 68)
6.028588763956215

Run Code Online (Sandbox Code Playgroud)

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train.reshape(X_train.shape[0], -1)).reshape(X_train.shape)
X_test = scaler.transform(X_test.reshape(X_test.shape[0], -1)).reshape(X_test.shape)
print(X_train.shape)
print(max(X_train[0][0]))
print(min(X_train[0][0]))

Run Code Online (Sandbox Code Playgroud)

输出

(1883, 100, 68)
3.2232538993444533
-1.9056918449890343

Run Code Online (Sandbox Code Playgroud)

该值仍然不在 1 和 -1 之间。

我应该如何处理它？

numpy machine-learning scikit-learn keras

Leo*_*Leo

2020 10-14

3
推荐指数

1
解决办法

3354
查看次数

标签统计

machine-learning ×2

scikit-learn ×2

keras ×1

numpy ×1

python ×1

标准化期间屏蔽 0 值

如何标准化 3D 数组的特定维度

标签 统计

小编Leo_Leo的帖子

标签统计