Ole*_*siy 2 python feature-selection lightgbm
我用 Python 拟合了基本的 LGBM 模型。
\n# Create an instance\nLGBM = LGBMRegressor(random_state = 123, importance_type = \'gain\') # `split` can be also selected here\n\n# Fit the model (subset of data)\nLGBM.fit(X_train_subset, y_train_subset)\n\n# Predict y_pred\ny_pred = LGBM.predict(X_test)\nRun Code Online (Sandbox Code Playgroud)\n我正在查看文档:
\n\n\nimportant_type(字符串,可选(默认=“split”))\xe2\x80\x93 如何计算\n重要性。如果 \xe2\x80\x9csplit\xe2\x80\x9d,\n结果包含该特征在模型中使用的次数。如果\xe2\x80\x9cgain\xe2\x80\x9d,结果包含使用该功能的分割的总\n增益。
\n
我使用了gain它,它打印了我的总收益。
# Print features by importantce\npd.DataFrame([X_train.columns, LGBM.feature_importances_]).T.sort_values([1], ascending = [True])\n\n 0 1\n\n59 SLG_avg_p 0\n4 PA_avg 2995.8\n0 home 5198.55\n26 next_home 11824.2\n67 first_time_pitcher 15042.1\netc\nRun Code Online (Sandbox Code Playgroud)\n我试过:
\n# get importance\nimportance = LGBM.feature_importances_\n# summarize feature importance\nfor i, v in enumerate(importance):\n print(\'Feature: %0d, Score: %.5f\' % (i,v))\n# plot feature importance\nplt.bar([x for x in range(len(importance))], importance)\nplt.show()\nRun Code Online (Sandbox Code Playgroud)\n并接收值和绘图:
\nFeature: 0, Score: 5198.55005\nFeature: 1, Score: 20688.87198\nFeature: 2, Score: 49147.90228\nFeature: 3, Score: 71734.03088\netc\nRun Code Online (Sandbox Code Playgroud)\n\n我也尝试过:
\n# feature importance\nprint(LGBM.feature_importances_)\n# plot\nplt.bar(range(len(LGBM.feature_importances_)), LGBM.feature_importances_)\nplt.show()\nRun Code Online (Sandbox Code Playgroud)\n如何打印该模型中的百分比?出于某种原因,我确信他们会自动计算它。
\n百分比选项在R 版本中可用,但在Python 版本中不可用。在Python中,你可以执行以下操作(使用一个虚构的示例,因为我没有你的数据):
from sklearn.datasets import make_regression
import matplotlib.pyplot as plt
from lightgbm import LGBMRegressor
import pandas as pd
X, y = make_regression(n_samples=1000, n_features=10, n_informative=10, random_state=1)
feature_names = [f'Feature {i+1}' for i in range(10)]
X = pd.DataFrame(X, columns=feature_names)
model = LGBMRegressor(importance_type='gain')
model.fit(X, y)
feature_importances = (model.feature_importances_ / sum(model.feature_importances_)) * 100
results = pd.DataFrame({'Features': feature_names,
'Importances': feature_importances})
results.sort_values(by='Importances', inplace=True)
ax = plt.barh(results['Features'], results['Importances'])
plt.xlabel('Importance percentages')
plt.show()
Run Code Online (Sandbox Code Playgroud)
输出:
| 归档时间: |
|
| 查看次数: |
5846 次 |
| 最近记录: |