带有自定义折叠 python 的 xgboost CV

Question

带有自定义折叠 python 的 xgboost CV

Oks*_*ana 4 python cross-validation xgboost

我正在处理数据，其中每个患者都可以有不同数量的训练示例。运行 Xgboost CV 时，我想确保来自同一患者的数据仅限于出现在同一折叠中，因此我需要使用折叠，其中可能有不同数量的索引。

在 xgb.cv 函数中使用 'fold' 参数传递包含索引的 numpy 数组列表时，我得到：

dtrain = dall.slice(np.concatenate([idset[i] for i in range(nfold) if k != i])) ValueError：无法连接零维数组

通过将我的自定义折叠作为列表传递，其中每个元素都是测试折叠索引的向量，我在 R 中实现了相同的过程，没有任何问题。

您能否建议将自定义索引传递给 Python XGBoost CV 函数的正确方法是什么。谢谢！

Answer 1

Ali*_*Ali 5

这是旧的，但当我遇到类似的问题时，我在谷歌搜索上为我提供了一个答案。

我想将 TimeSeriesSplit 与 xgboost cv 一起使用，但无法直接使用，因为 folds 参数需要 KFold 或 StratifiedKFold，但是，您可以将自己的索引列表作为元组列表提供，如下所示

train1 =  [0, 1, 2, 3, 4] 
test1  =  [4, 5, 6, 7, 8]

train2 =  [9 ,10 ,11 ,12 ,13]
test2 =   [14, 15, 16, 17, 18]

train3=  [19, 20, 21, 22, 23, 24]
test3 =  [25, 26, 27, 28, 29, 30]

tsFolds = [(train1, test1), (train2, test2), (train3, test3)]

xgbCV = xgb.cv(
    params = parameters, 
    dtrain = trainDMat, 
    num_boost_round = num_boost_round,
    nfold = len(tsFolds),
    folds = tsFolds,
    metrics = {'rmse'},
    early_stopping_rounds = early_stopping_rounds,
    verbose_eval = True,
    seed = seed     
)

Run Code Online (Sandbox Code Playgroud)

归档时间：	8 年，10 月前
查看次数：	1704 次
最近记录：	7 年，4 月前