XGBoost CV 和最佳迭代

Question

XGBoost CV 和最佳迭代

Chr*_*rry 6 python statistics machine-learning xgboost

我正在使用 XGBoost cv 来查找模型的最佳轮数。如果有人能确认（或反驳），最佳轮数是：

    estop = 40
    res = xgb.cv(params, dvisibletrain, num_boost_round=1000000000, nfold=5, early_stopping_rounds=estop, seed=SEED, stratified=True)

    best_nrounds = res.shape[0] - estop
    best_nrounds = int(best_nrounds / 0.8)

Run Code Online (Sandbox Code Playgroud)

即：完成的总轮数为res.shape[0]，因此为了获得最佳轮数，我们减去提前停止的轮数。

然后，我们根据用于验证的分数扩大轮数。那是对的吗？

Answer 1

Jiv*_*van 4

是的，如果您best_nrounds = int(best_nrounds / 0.8)认为验证集占整个训练数据的 20%（另一种说法是您执行了 5 倍交叉验证），那么这听起来是正确的。

该规则可以概括为：

n_folds = 5
best_nrounds = int((res.shape[0] - estop) / (1 - 1 / n_folds))

Run Code Online (Sandbox Code Playgroud)

或者，如果您不执行 CV，而是执行一次验证：

validation_slice = 0.2
best_nrounds = int((res.shape[0] - estop) / (1 - validation_slice))

Run Code Online (Sandbox Code Playgroud)

您可以在 Kaggle 上看到应用此规则的示例（请参阅评论）。

归档时间：	8 年，11 月前
查看次数：	8922 次
最近记录：	8 年，9 月前