如何在python中使用NaN值标准化数据

Question

我使用的数据有一些空值，我想使用 knn 插补来插补空值。为了有效地估算我想规范化数据。

normalizer = Normalizer() #from sklearn.preprocessing
normalizer.fit_transform(data[num_cols]) #columns with numeric value

错误：输入包含 NaN、无穷大或对于 dtype('float64') 来说太大的值。

那么我如何规范化具有 NaN 的数据

Answer 1

我建议不要在 sklearn 中使用标准化，因为它不处理 NaN。您可以简单地使用下面的代码来标准化您的数据。

df['col']=(df['col']-df['col'].min())/(df['col'].max()-df['col'].min())

上述方法在对数据进行规范化时忽略 NaN