如何用树木森林标记特征重要性？

Question

如何用树木森林标记特征重要性？

Ele*_*hys 8 python numpy matplotlib scikit-learn sklearn-pandas

我使用sklearn绘制树木森林的特征重要性.数据框名为"heart".这里是提取已排序功能列表的代码:

importances = extc.feature_importances_
indices = np.argsort(importances)[::-1]
print("Feature ranking:")

for f in range(heart_train.shape[1]):
    print("%d. feature %d (%f)" % (f + 1, indices[f], importances[indices[f]]))

Run Code Online (Sandbox Code Playgroud)

然后我用这种方式绘制列表:

f, ax = plt.subplots(figsize=(11, 9))
plt.title("Feature ranking", fontsize = 20)
plt.bar(range(heart_train.shape[1]), importances[indices],
    color="b", 
    align="center")
plt.xticks(range(heart_train.shape[1]), indices)
plt.xlim([-1, heart_train.shape[1]])
plt.ylabel("importance", fontsize = 18)
plt.xlabel("index of the feature", fontsize = 18)

Run Code Online (Sandbox Code Playgroud)

我得到一个这样的情节:

我的问题是:我怎么能用功能的名称替换功能的NUMBER才能使情节变得更容易理解？我试图转换包含该功能名称的字符串(这是数据框每列的名称),但我无法达到目标.

谢谢

Answer 1

bak*_*kal 3

问题就在这里：

plt.xticks(range(heart_train.shape[1]), indices)

Run Code Online (Sandbox Code Playgroud)

indices是从您返回的索引数组np.argsort(importances)[::-1]，它没有您希望在 X 轴上显示为刻度的功能名称。

你需要这样的东西，假设df是你的 Pandas DataFrame

feature_names = df.columns # e.g. ['A', 'B', 'C', 'D', 'E']
plt.xticks(range(heart_train.shape[1]), feature_names)

Run Code Online (Sandbox Code Playgroud)

归档时间：	9 年，8 月前
查看次数：	5380 次
最近记录：	7 年，7 月前