Geo*_*ler 8 python pipeline kwargs scikit-learn xgboost
类似于如何在scikit中仅将参数传递给管道对象的一部分?我想将参数传递给管道的一部分.通常,它应该工作正常,如:
estimator = XGBClassifier()
pipeline = Pipeline([
('clf', estimator)
])
Run Code Online (Sandbox Code Playgroud)
并执行像
pipeline.fit(X_train, y_train, clf__early_stopping_rounds=20)
Run Code Online (Sandbox Code Playgroud)
但它失败了:
/usr/local/lib/python3.5/site-packages/sklearn/pipeline.py in fit(self, X, y, **fit_params)
114 """
115 Xt, yt, fit_params = self._pre_transform(X, y, **fit_params)
--> 116 self.steps[-1][-1].fit(Xt, yt, **fit_params)
117 return self
118
/usr/local/lib/python3.5/site-packages/xgboost-0.6-py3.5.egg/xgboost/sklearn.py in fit(self, X, y, sample_weight, eval_set, eval_metric, early_stopping_rounds, verbose)
443 early_stopping_rounds=early_stopping_rounds,
444 evals_result=evals_result, obj=obj, feval=feval,
--> 445 verbose_eval=verbose)
446
447 self.objective = xgb_options["objective"]
/usr/local/lib/python3.5/site-packages/xgboost-0.6-py3.5.egg/xgboost/training.py in train(params, dtrain, num_boost_round, evals, obj, feval, maximize, early_stopping_rounds, evals_result, verbose_eval, learning_rates, xgb_model, callbacks)
201 evals=evals,
202 obj=obj, feval=feval,
--> 203 xgb_model=xgb_model, callbacks=callbacks)
204
205
/usr/local/lib/python3.5/site-packages/xgboost-0.6-py3.5.egg/xgboost/training.py in _train_internal(params, dtrain, num_boost_round, evals, obj, feval, xgb_model, callbacks)
97 end_iteration=num_boost_round,
98 rank=rank,
---> 99 evaluation_result_list=evaluation_result_list))
100 except EarlyStopException:
101 break
/usr/local/lib/python3.5/site-packages/xgboost-0.6-py3.5.egg/xgboost/callback.py in callback(env)
196 def callback(env):
197 """internal function"""
--> 198 score = env.evaluation_result_list[-1][1]
199 if len(state) == 0:
200 init(env)
IndexError: list index out of range
Run Code Online (Sandbox Code Playgroud)
而a
estimator.fit(X_train, y_train, early_stopping_rounds=20)
Run Code Online (Sandbox Code Playgroud)
工作得很好.
小智 11
对于早期停止轮次,必须始终指定参数eval_set给出的验证集.以下是如何修复代码中的错误.
pipeline.fit(X_train, y_train, clf__early_stopping_rounds=20, clf__eval_set=[(test_X, test_y)])
Run Code Online (Sandbox Code Playgroud)
这是解决方案:https://www.kaggle.com/c/otto-group-product-classification-challenge/forums/t/13755/xgboost-early-stopping-and-other-issues early_stooping_rounds和监视列表/需要传递eval_set.不幸的是,这对我不起作用,因为监视列表中的变量需要预处理步骤,该步骤仅应用于管道/我需要手动应用此步骤.
小智 6
我最近使用以下步骤为 Xgboost 使用 eval 指标和 eval_set 参数。
pipeline_temp = pipeline.Pipeline(pipeline.cost_pipe.steps[:-1])
Run Code Online (Sandbox Code Playgroud)
X_trans = pipeline_temp.fit_transform(X_train[FEATURES],y_train)
Run Code Online (Sandbox Code Playgroud)
eval_set = [(X_trans, y_train), (pipeline_temp.transform(X_test), y_test)]
Run Code Online (Sandbox Code Playgroud)
pipeline_temp.steps.append(pipeline.cost_pipe.steps[-1])
Run Code Online (Sandbox Code Playgroud)
pipeline_temp.fit(X_train[FEATURES], y_train,
xgboost_model__eval_metric = ERROR_METRIC,
xgboost_model__eval_set = eval_set)
Run Code Online (Sandbox Code Playgroud)
joblib.dump(pipeline_temp, save_path)
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
4381 次 |
| 最近记录: |