我正在使用Scikit-learn RFECV为使用交叉验证的逻辑回归选择最重要的特征.假设X是特征的[n,x]数据帧,y代表响应变量:
from sklearn.pipeline import make_pipeline
from sklearn.grid_search import GridSearchCV
from sklearn.cross_validation import StratifiedKFold
from sklearn import preprocessing
from sklearn.feature_selection import RFECV
import sklearn
import sklearn.linear_model as lm
import sklearn.grid_search as gs
# Create a logistic regression estimator
logreg = lm.LogisticRegression()
# Use RFECV to pick best features, using Stratified Kfold
rfecv = RFECV(estimator=logreg, cv=StratifiedKFold(y, 3), scoring='roc_auc')
# Fit the features to the response variable
rfecv.fit(X, y)
# Put the best features into new df X_new
X_new = rfecv.transform(X)
# …Run Code Online (Sandbox Code Playgroud)