Clo*_*ave 16 python knn scikit-learn
我有一个看起来像这样的数据集
1908 January 5.0 -1.4
1908 February 7.3 1.9
1908 March 6.2 0.3
1908 April NaN 2.1
1908 May NaN 7.7
1908 June 17.7 8.7
1908 July NaN 11.0
1908 August 17.5 9.7
1908 September 16.3 8.4
1908 October 14.6 8.0
1908 November 9.6 3.4
1908 December 5.8 NaN
1909 January 5.0 0.1
1909 February 5.5 -0.3
1909 March 5.6 -0.3
1909 April 12.2 3.3
1909 May 14.7 4.8
1909 June 15.0 7.5
1909 July 17.3 10.8
1909 August 18.8 10.7
Run Code Online (Sandbox Code Playgroud)
我想NaN
用KNN 替换s作为方法.我抬起头sklearn
小号Imputer
类,但它仅支持均值,中位数和模式归集.这里有一个功能请求,但我认为现在没有实现.关于如何NaN
使用KNN 替换最后两列中的s的任何想法?
编辑:由于我需要在另一个环境中运行代码,我没有安装包的奢侈.我只能使用sklearn,pandas,numpy和其他标准包装.
Mir*_*ber 17
fancyimpute包支持这种插补,使用以下API:
from fancyimpute import KNN
# X is the complete data matrix
# X_incomplete has the same values as X except a subset have been replace with NaN
# Use 3 nearest rows which have a feature to fill in each row's missing features
X_filled_knn = KNN(k=3).complete(X_incomplete)
Run Code Online (Sandbox Code Playgroud)
以下是此软件包支持的插补:
•SimpleFill:用每列的均值或中位数替换缺失的条目.
•KNN:最近邻估算,使用两行都具有观测数据的特征的均方差来对样本进行加权.
•SoftImpute:通过SVD分解的迭代软阈值处理来完成矩阵.受到用于R的softImpute包的启发,该软件包基于Mazumder等人的用于学习大型不完全矩阵的谱化正则化算法.人.
•IterativeSVD:通过迭代低秩SVD分解完成矩阵.应该与Troyanskaya等人的DNA微阵列的遗失值估计方法类似于SVDimpute.人.
•MICE:用链式方程重新实现多重插补.
•MatrixFactorization:将不完全矩阵直接分解为低秩U和V,对U元素进行L1稀疏性惩罚,对V元素进行L2惩罚.通过梯度下降求解.
•NuclearNormMinimization:使用cvxpy通过Emmanuel Candes和Benjamin Recht通过Convex Optimization简单实现精确矩阵完成.对于大型矩阵来说太慢了.
•BiScaler:行/列均值和标准偏差的迭代估计,以获得双重归一化矩阵.不保证收敛但在实践中运作良好.通过快速交替最小二乘法从矩阵完成和低秩SVD中获取.
fancyimpute的KNN插补不再支持complete
其他答案建议的功能,我们现在需要使用fit_transform
# X is the complete data matrix
# X_incomplete has the same values as X except a subset have been replace with NaN
# Use 3 nearest rows which have a feature to fill in each row's missing features
X_filled_knn = KNN(k=3).fit_transform(X_incomplete)
Run Code Online (Sandbox Code Playgroud)
参考https://github.com/iskandr/fancyimpute
scikit-learn
v0.22 支持原生KNN 插补
import numpy as np
from sklearn.impute import KNNImputer
X = [[1, 2, np.nan], [3, 4, 3], [np.nan, 6, 5], [8, 8, 7]]
imputer = KNNImputer(n_neighbors=2)
print(imputer.fit_transform(X))
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
25433 次 |
最近记录: |