如何通过索引自定义sklearn交叉验证迭代器?

tan*_*ngy 10 python validation scikit-learn cross-validation

类似于自定义交叉验证split sklearn我想为GridSearchCV定义自己的拆分,我需要自定义内置的交叉验证迭代器.

我想将自己的一组列车测试索引传递给GridSearch,而不是允许迭代器为我确定它们.我浏览了sklearn文档页面上的可用cv迭代器但找不到它.

例如,我想实现类似这样的数据有9个样本2折cv我创建了自己的一套训练测试索引

>>> train_indices = [[1,3,5,7,9],[2,4,6,8]]
>>> test_indices = [[2,4,6,8],[1,3,5,7,9]]
                 1st fold^    2nd fold^
>>> custom_cv = sklearn.cross_validation.customcv(train_indices,test_indices)
>>> clf = GridSearchCV(X,y,params,cv=custom_cv)
Run Code Online (Sandbox Code Playgroud)

什么可以像customcv一样工作?

eic*_*erg 12

实际上,交叉验证迭代器只是:迭代器.他们在每次迭代时都会返回一列火车/测试折叠.这应该适合你:

custom_cv = zip(train_indices, test_indices)
Run Code Online (Sandbox Code Playgroud)

另外,对于您提到的具体情况,您可以这样做

import numpy as np
labels = np.arange(0, 10) % 2
from sklearn.cross_validation import LeaveOneLabelOut
cv = LeaveOneLabelOut(labels)
Run Code Online (Sandbox Code Playgroud)

观察list(cv)产量

[(array([1, 3, 5, 7, 9]), array([0, 2, 4, 6, 8])),
 (array([0, 2, 4, 6, 8]), array([1, 3, 5, 7, 9]))]
Run Code Online (Sandbox Code Playgroud)


Cib*_*bic 5

实际上,上面的解决方案将每一行作为折叠返回,真正需要的是:

    [(train_indices, test_indices)] # for one fold

    [(train_indices, test_indices), # 1stfold
    (train_indices, test_indices)] # 2nd fold etc
Run Code Online (Sandbox Code Playgroud)