如何使用XGboost优化sklearn管道,用于不同的`eval_metric`？

Question

如何使用XGboost优化sklearn管道,用于不同的`eval_metric`？

sap*_*ico 5 python pipeline classification scikit-learn xgboost

我试图用XGBoost,优化eval_metric的auc(如描述在这里).

这在直接使用分类器时工作正常,但在我尝试将其用作管道时失败.

将.fit参数传递给sklearn管道的正确方法是什么？

例:

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import load_iris
from xgboost import XGBClassifier
import xgboost
import sklearn

print('sklearn version: %s' % sklearn.__version__)
print('xgboost version: %s' % xgboost.__version__)

X, y = load_iris(return_X_y=True)

# Without using the pipeline: 
xgb = XGBClassifier()
xgb.fit(X, y, eval_metric='auc')  # works fine

# Making a pipeline with this classifier and a scaler:
pipe = Pipeline([('scaler', StandardScaler()), ('classifier', XGBClassifier())])

# using the pipeline, but not optimizing for 'auc': 
pipe.fit(X, y)  # works fine

# however this does not work (even after correcting the underscores): 
pipe.fit(X, y, classifier__eval_metric='auc')  # fails

Run Code Online (Sandbox Code Playgroud)

错误:
TypeError: before_fit() got an unexpected keyword argument 'classifier__eval_metric'

关于xgboost的版本:
xgboost.__version__显示0.6
pip3 freeze | grep xgboost显示xgboost==0.6a2.

Answer 1

Viv*_*mar 4

该错误是因为在管道中使用时，您在估计器名称与其参数之间使用了单个下划线。应该是两个下划线。

从Pipeline.fit() 的文档中，我们看到在 fit 中提供参数的正确方法：

传递给每个步骤的 fit 方法的参数，其中每个参数名称都有前缀，以便步骤 s 的参数 p 具有键 s__p。

所以在你的情况下，正确的用法是：

pipe.fit(X_train, y_train, classifier__eval_metric='auc')

Run Code Online (Sandbox Code Playgroud)

（注意名称和参数之间有两个下划线）

归档时间：	8 年，7 月前
查看次数：	5044 次
最近记录：	7 年，1 月前