use*_*827 3 python numpy scikit-learn
# Load dataset
iris = datasets.load_iris()
X, y = iris.data, iris.target
rf_feature_imp = RandomForestClassifier(100)
feat_selection = SelectFromModel(rf_feature_imp, threshold=0.5)
clf = RandomForestClassifier(5000)
model = Pipeline([
('fs', feat_selection),
('clf', clf),
])
params = {
'fs__threshold': [0.5, 0.3, 0.7],
'fs__estimator__max_features': ['auto', 'sqrt', 'log2'],
'clf__max_features': ['auto', 'sqrt', 'log2'],
}
gs = GridSearchCV(model, params, ...)
gs.fit(X,y)
Run Code Online (Sandbox Code Playgroud)
上述代码基于确保scikit learn中随机森林分类中的操作顺序
由于我使用的是SelectFromModel,我想打印所选功能的名称(在SelectFromModel管道中),但不确定如何提取它们.
一种方法是transform()在要素名称上调用要素选择器,但必须以示例列表的形式显示要素名称.
首先,您必须从中找到的最佳估算器中获取特征选择阶段GridSearchCV.
fs = gs.best_estimator_.named_steps['fs']
Run Code Online (Sandbox Code Playgroud)
从feature_names创建示例列表:
feature_names_example = [iris.feature_names]
Run Code Online (Sandbox Code Playgroud)
使用特征选择器转换此示例.
selected_features = fs.transform(feature_names_example)
print selected_features[0] # Select the one example
# ['sepal length (cm)' 'petal length (cm)' 'petal width (cm)']
Run Code Online (Sandbox Code Playgroud)
SelectFromModel有一个get_support()方法可以返回所选特征的布尔掩码。所以你可以这样做(除了@David Maust 描述的预备步骤):
feature_names = np.array(iris.feature_names)
selected_features = feature_names[fs.get_support()]
Run Code Online (Sandbox Code Playgroud)