我有一个数据集,以前分为3组:训练,验证和测试.必须使用这些集合以便比较不同算法的性能.
我现在想使用验证集优化我的SVM的参数.但是,我无法找到如何明确输入验证集sklearn.grid_search.GridSearchCV().下面是我之前用于在训练集上进行K折叠交叉验证的一些代码.但是,对于这个问题,我需要使用给定的验证集.我怎样才能做到这一点?
from sklearn import svm, cross_validation
from sklearn.grid_search import GridSearchCV
# (some code left out to simplify things)
skf = cross_validation.StratifiedKFold(y_train, n_folds=5, shuffle = True)
clf = GridSearchCV(svm.SVC(tol=0.005, cache_size=6000,
class_weight=penalty_weights),
param_grid=tuned_parameters,
n_jobs=2,
pre_dispatch="n_jobs",
cv=skf,
scoring=scorer)
clf.fit(X_train, y_train)
Run Code Online (Sandbox Code Playgroud) 我正在尝试使用我提供的拆分cross_val_score来运行。sklearn该sklearn文档给出了以下示例:
>>> from sklearn.model_selection import PredefinedSplit
>>> X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])
>>> y = np.array([0, 0, 1, 1])
>>> test_fold = [0, 1, -1, 1]
>>> ps = PredefinedSplit(test_fold)
>>> ps.get_n_splits()
2
>>> print(ps)
PredefinedSplit(test_fold=array([ 0, 1, -1, 1]))
>>> for train_index, test_index in ps.split():
... print("TRAIN:", train_index, "TEST:", test_index)
... X_train, X_test = X[train_index], X[test_index]
... y_train, y_test = y[train_index], y[test_index]
TRAIN: [1 2 3] …Run Code Online (Sandbox Code Playgroud)