jax*_*jax 3 python machine-learning scikit-learn scikit-image
我用一段数据解释了这个场景:
防爆.数据集.
GA_ID PN_ID PC_ID MBP_ID GR_ID AP_ID class
0.033 6.652 6.681 0.194 0.874 3.177 0
0.034 9.039 6.224 0.194 1.137 0 0
0.035 10.936 10.304 1.015 0.911 4.9 1
0.022 10.11 9.603 1.374 0.848 4.566 1
0.035 2.963 17.156 0.599 0.823 9.406 1
0.033 10.872 10.244 1.015 0.574 4.871 1
0.035 21.694 22.389 1.015 0.859 9.259 1
0.035 10.936 10.304 1.015 0.911 4.9 1
0.035 10.936 10.304 1.015 0.911 4.9 1
0.035 10.936 10.304 1.015 0.911 4.9 0
0.036 1.373 12.034 0.35 0.259 5.723 0
0.033 9.831 9.338 0.35 0.919 4.44 0
Run Code Online (Sandbox Code Playgroud)
特征选择步骤1及其出现:VarianceThreshol
PN_ID PC_ID MBP_ID GR_ID AP_ID class
6.652 6.681 0.194 0.874 3.177 0
9.039 6.224 0.194 1.137 0 0
10.936 10.304 1.015 0.911 4.9 1
10.11 9.603 1.374 0.848 4.566 1
2.963 17.156 0.599 0.823 9.406 1
10.872 10.244 1.015 0.574 4.871 1
21.694 22.389 1.015 0.859 9.259 1
10.936 10.304 1.015 0.911 4.9 1
10.936 10.304 1.015 0.911 4.9 1
10.936 10.304 1.015 0.911 4.9 0
1.373 12.034 0.35 0.259 5.723 0
9.831 9.338 0.35 0.919 4.44 0
Run Code Online (Sandbox Code Playgroud)
特征选择步骤2及其出现:基于树的特征选择(例如来自klearn.ensemble import ExtraTreesClassifier)
PN_ID MBP_ID GR_ID AP_ID class
6.652 0.194 0.874 3.177 0
9.039 0.194 1.137 0 0
10.936 1.015 0.911 4.9 1
10.11 1.374 0.848 4.566 1
2.963 0.599 0.823 9.406 1
10.872 1.015 0.574 4.871 1
21.694 1.015 0.859 9.259 1
10.936 1.015 0.911 4.9 1
10.936 1.015 0.911 4.9 1
10.936 1.015 0.911 4.9 0
1.373 0.35 0.259 5.723 0
9.831 0.35 0.919 4.44 0
Run Code Online (Sandbox Code Playgroud)
在这里,我们可以得出结论,我们从6列(特征)和一个类标签开始,最后一步将其缩减为4个特征和一个类标签.GA_ID和PC_ID列已被删除,而模型已使用PN_ID,MBP_ID,GR_ID和AP_ID功能构建.
但不幸的是,当我使用scikit-learn库中的可用方法执行特征选择时,我发现它只返回数据的形状和缩小的数据,而没有选择和省略的特征的名称.
我已经写下了许多愚蠢的python代码(因为我不是很有经验的程序员)来找到答案但没有成功.
请建议我一些方法来摆脱它谢谢
(注意:特别是对于这篇文章,我从未在给定的示例数据集上执行过任何特征选择方法,而是我已经随机删除了该列来解释这个案例)
也许这段代码和评论的解释将有所帮助(从这里改编).
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification
from sklearn.ensemble import ExtraTreesClassifier
# Build a classification task using 3 informative features
X, y = make_classification(n_samples=1000,
n_features=10,
n_informative=3,
n_redundant=0,
n_repeated=0,
n_classes=2,
random_state=0,
shuffle=False)
# Build a forest and compute the feature importances
forest = ExtraTreesClassifier(n_estimators=250,
random_state=0)
forest.fit(X, y)
importances = forest.feature_importances_ #array with importances of each feature
idx = np.arange(0, X.shape[1]) #create an index array, with the number of features
features_to_keep = idx[importances > np.mean(importances)] #only keep features whose importance is greater than the mean importance
#should be about an array of size 3 (about)
print features_to_keep.shape
x_feature_selected = X[:,features_to_keep] #pull X values corresponding to the most important features
print x_feature_selected
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
1440 次 |
最近记录: |