Xgboost：bst.best_score、bst.best_iteration 和 bst.best_ntree_limit 有什么区别？

Question

Xgboost：bst.best_score、bst.best_iteration 和 bst.best_ntree_limit 有什么区别？

Lan*_*mes 15 python machine-learning xgboost

当我使用 xgboost 训练我的数据时2-cates classification problem，我想使用提前停止来获得最佳模型，但我对在我的预测中使用哪一个感到困惑，因为提前停止将返回 3 个不同的选择。例如，我应该使用

preds = model.predict(xgtest, ntree_limit=bst.best_iteration)

Run Code Online (Sandbox Code Playgroud)

或者我应该使用

preds = model.predict(xgtest, ntree_limit=bst.best_ntree_limit)

Run Code Online (Sandbox Code Playgroud)

还是两者都对，它们应该适用于不同的情况？如果是这样，我如何判断使用哪一个？

这是xgboost文档的原始引用，但它没有给出原因，我也没有找到这些参数之间的比较：

提前停止

如果你有一个验证集，你可以使用提前停止来找到最佳的提升轮数。提前停止至少需要在 evals 中设置一组。如果有多个，它将使用最后一个。

train(..., evals=evals, early_stopping_rounds=10)

该模型将一直训练，直到验证分数停止提高。验证错误至少需要每 early_stopping_rounds 减少一次才能继续训练。

如果发生提前停止，模型将具有三个附加字段：bst.best_score、bst.best_iteration 和 bst.best_ntree_limit。请注意，train() 将返回上次迭代的模型，而不是最好的模型。预言

经过训练或加载的模型可以对数据集进行预测。
# 7 entities, each contains 10 features 
data = np.random.rand(7, 10) 
dtest = xgb.DMatrix(data) 
ypred = bst.predict(dtest)
Run Code Online (Sandbox Code Playgroud)
如果在训练期间启用了提前停止，您可以使用 bst.best_ntree_limit 从最佳迭代中获得预测：

ypred = bst.predict(dtest,ntree_limit=bst.best_ntree_limit)

提前致谢。

Answer 1

Ant*_*uis 5

在我看来，这两个参数指的是同一个想法，或者至少有相同的目标。但我宁愿使用：

preds = model.predict(xgtest, ntree_limit=bst.best_iteration)

Run Code Online (Sandbox Code Playgroud)

从源代码中，我们可以看到这里将best_ntree_limit被删除以支持best_iteration.

def _get_booster_layer_trees(model: "Booster") -> Tuple[int, int]:
    """Get number of trees added to booster per-iteration.  This function will be removed
    once `best_ntree_limit` is dropped in favor of `best_iteration`.  Returns
    `num_parallel_tree` and `num_groups`.
    """

Run Code Online (Sandbox Code Playgroud)

此外，best_ntree_limit已从EarlyStopping文档页面中删除。

所以我认为这个属性的存在只是为了向后兼容的原因。因此，根据此代码片段和文档，我们可以假设它best_ntree_limit已被弃用或将被弃用。

归档时间：	8 年，7 月前
查看次数：	3693 次
最近记录：	4 年，5 月前