绘制重要性变量xgboost Python

Question

绘制重要性变量xgboost Python

Rox*_*lia 3 python machine-learning matplotlib feature-selection xgboost

当我绘制功能重要性图时，会出现混乱的图。我有7000多个变量。我了解内置功能只会选择最重要的功能，尽管最终图形不可读。这是完整的代码：

import numpy as np
import pandas as pd
df = pd.read_csv('ricerice.csv')
array=df.values
X = array[:,0:7803]
Y = array[:,7804]
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
seed=0
test_size=0.30
X_train, X_test, y_train, y_test = train_test_split(X,Y,test_size=test_size, random_state=seed)
from xgboost import XGBClassifier
model = XGBClassifier()
model.fit(X, Y)
import matplotlib.pyplot as plt
from matplotlib import pyplot
from xgboost import plot_importance
fig1=plt.gcf()
plot_importance(model)
plt.draw()
fig1.savefig('xgboost.png', figsize=(50, 40), dpi=1000)

Run Code Online (Sandbox Code Playgroud)

尽管该图的尺寸很大，但该图难以辨认。

Answer 1

San*_*Dey 8

有两点：

要拟合模型，您想使用训练数据集（X_train, y_train），而不是整个数据集（X, y）。
您可以使用函数的max_num_features参数plot_importance()仅显示主要max_num_features功能（例如，前10个）。

经过对代码的上述修改，使用一些随机生成的数据，代码和输出如下：

import numpy as np

# generate some random data for demonstration purpose, use your original dataset here
X = np.random.rand(1000,100)     # 1000 x 100 data
y = np.random.rand(1000).round() # 0, 1 labels

from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
seed=0
test_size=0.30
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=test_size, random_state=seed)
from xgboost import XGBClassifier
model = XGBClassifier()
model.fit(X_train, y_train)
import matplotlib.pylab as plt
from matplotlib import pyplot
from xgboost import plot_importance
plot_importance(model, max_num_features=10) # top 10 most important features
plt.show()

Run Code Online (Sandbox Code Playgroud)

归档时间：	7 年，2 月前
查看次数：	3320 次
最近记录：	7 年，2 月前