使用scikit-learn进行ANOVA测试的交叉验证特征选择

Question

使用scikit-learn进行ANOVA测试的交叉验证特征选择

HHH*_*HHH 1 python feature-selection scikit-learn

我正在使用scikit-learn进行功能选择。这是我的代码

from sklearn.feature_selection import GenericUnivariateSelect
from sklearn.feature_selection import f_classif


scores = GenericUnivariateSelect(f_classif, 'k_best').fit(features_pd, target_pd)

Run Code Online (Sandbox Code Playgroud)

我如何使用f_classif简历方式，以便结果更可靠？

Answer 1

Moh*_*hif 5

Scikit-learn具有递归特征消除和交叉验证的选择方法，称为RFECV。以下代码仅供参考，与该链接上给出的示例相似。

import matplotlib.pyplot as plt
from sklearn.svm import SVC
from sklearn.cross_validation import StratifiedKFold
from sklearn.feature_selection import RFECV
svc = SVC(kernel="linear")
rfecv = RFECV(estimator=svc, step=1, cv=StratifiedKFold(labels, 50),
      scoring='precision')
rfecv.fit(features, labels)
print("Optimal number of features : %d" % rfecv.n_features_)
print rfecv.support_
features=features[:,rfecv.support_]
# Plot number of features VS. cross-validation scores
plt.figure()
plt.xlabel("Number of features selected")
plt.ylabel("Cross validation score (nb of correct classifications)")
plt.plot(range(1, len(rfecv.grid_scores_) + 1), rfecv.grid_scores_)
plt.show()

Run Code Online (Sandbox Code Playgroud)

样本输出：

参考链接：

编辑：使用ANOVA测试通过CV选择特征

要使用Anova测试和交叉验证，您需要使用Pipeline，Select Percentile和cross-val score。根据此处给出的示例，您可以结合使用这些技术来使用CV + Annova测试进行特征选择。

归档时间：	8 年，2 月前
查看次数：	2056 次
最近记录：	8 年，2 月前