使用KNN在python中缺少值插补

Clo*_*ave 16 python knn scikit-learn

我有一个看起来像这样的数据集

1908    January 5.0 -1.4
1908    February    7.3 1.9
1908    March   6.2 0.3
1908    April   NaN   2.1
1908    May NaN   7.7
1908    June    17.7    8.7
1908    July    NaN   11.0
1908    August  17.5    9.7
1908    September   16.3    8.4
1908    October 14.6    8.0
1908    November    9.6 3.4
1908    December    5.8 NaN
1909    January 5.0 0.1
1909    February    5.5 -0.3
1909    March   5.6 -0.3
1909    April   12.2    3.3
1909    May 14.7    4.8
1909    June    15.0    7.5
1909    July    17.3    10.8
1909    August  18.8    10.7  
Run Code Online (Sandbox Code Playgroud)

我想NaN用KNN 替换s作为方法.我抬起头sklearn小号Imputer类,但它仅支持均值,中位数和模式归集.这里有一个功能请求,但我认为现在没有实现.关于如何NaN使用KNN 替换最后两列中的s的任何想法?

编辑:由于我需要在另一个环境中运行代码,我没有安装包的奢侈.我只能使用sklearn,pandas,numpy和其他标准包装.

Mir*_*ber 17

fancyimpute包支持这种插补,使用以下API:

from fancyimpute import KNN    
# X is the complete data matrix
# X_incomplete has the same values as X except a subset have been replace with NaN

# Use 3 nearest rows which have a feature to fill in each row's missing features
X_filled_knn = KNN(k=3).complete(X_incomplete)
Run Code Online (Sandbox Code Playgroud)

以下是此软件包支持的插补:

•SimpleFill:用每列的均值或中位数替换缺失的条目.

•KNN:最近邻估算,使用两行都具有观测数据的特征的均方差来对样本进行加权.

•SoftImpute:通过SVD分解的迭代软阈值处理来完成矩阵.受到用于R的softImpute包的启发,该软件包基于Mazumder等人的用于学习大型不完全矩阵的谱化正则化算法.人.

•IterativeSVD:通过迭代低秩SVD分解完成矩阵.应该与Troyanskaya等人的DNA微阵列的遗失值估计方法类似于SVDimpute.人.

•MICE:用链式方程重新实现多重插补.

•MatrixFactorization:将不完全矩阵直接分解为低秩U和V,对U元素进行L1稀疏性惩罚,对V元素进行L2惩罚.通过梯度下降求解.

•NuclearNormMinimization:使用cvxpy通过Emmanuel Candes和Benjamin Recht通过Convex Optimization简单实现精确矩阵完成.对于大型矩阵来说太慢了.

•BiScaler:行/列均值和标准偏差的迭代估计,以获得双重归一化矩阵.不保证收敛但在实践中运作良好.通过快速交替最小二乘法从矩阵完成和低秩SVD中获取.

  • @ClockSlave然后你可以查看fancyImpute的代码并自己实现它. (5认同)

Raj*_*ddy 5

fancyimpute的KNN插补不再支持complete其他答案建议的功能,我们现在需要使用fit_transform

# X is the complete data matrix
# X_incomplete has the same values as X except a subset have been replace with NaN
# Use 3 nearest rows which have a feature to fill in each row's missing features

X_filled_knn = KNN(k=3).fit_transform(X_incomplete)    
Run Code Online (Sandbox Code Playgroud)

参考https://github.com/iskandr/fancyimpute


amr*_*rrs 5

scikit-learnv0.22 支持原生KNN 插补

import numpy as np
from sklearn.impute import KNNImputer

X = [[1, 2, np.nan], [3, 4, 3], [np.nan, 6, 5], [8, 8, 7]]
imputer = KNNImputer(n_neighbors=2)
print(imputer.fit_transform(X))
Run Code Online (Sandbox Code Playgroud)