相关疑难解决方法(0)

scikit-learn GridSearchCV多次重复

我正在尝试为SVR模型获取最佳参数集.我想使用GridSearchCV超过不同的值C.但是,从之前的测试中我发现,分成训练/测试集高可影响整体表现(在这种情况下为r2).为了解决这个问题,我想实现重复的5倍交叉验证(10 x 5CV).是否有内置的方式来执行它GridSearchCV？

快速解决方案:

遵循sci-kit 官方文档中提出的想法,快速解决方案代表:

NUM_TRIALS = 10
scores = []
for i in range(NUM_TRIALS):
     cv = KFold(n_splits=5, shuffle=True, random_state=i)
     clf = GridSearchCV(estimator=svr, param_grid=p_grid, cv=cv)
     scores.append(clf.best_score_)
print "Average Score: {0} STD: {1}".format(numpy.mean(scores), numpy.std(scores))

Run Code Online (Sandbox Code Playgroud)

python scikit-learn cross-validation grid-search

Tit*_*llo

2017 02-16

9
推荐指数

2
解决办法

7478
查看次数

如何正确地将 GridSearchCV 与 cross_val_score 结合使用？

目前我有以下代码：

我首先将数据集分为训练集和测试集。然后我运行 GridSearchCV 来尝试找到最佳参数。找到最佳参数后，我通过 cross_val_score 使用参数评估分类器。这是一个可以接受的方法吗？

python scikit-learn

Wew*_*Lad

2018 07-03

5
推荐指数

1
解决办法

3365
查看次数

使用GridSearch时使用Scikit-learn的模型帮助

作为安然项目的一部分,构建了附加模型,下面是步骤的摘要,

以下型号给出了非常完美的分数

cv = StratifiedShuffleSplit(n_splits = 100, test_size = 0.2, random_state = 42)
gcv = GridSearchCV(pipe, clf_params,cv=cv)

gcv.fit(features,labels) ---> with the full dataset

for train_ind, test_ind in cv.split(features,labels):
    x_train, x_test = features[train_ind], features[test_ind]
    y_train, y_test = labels[train_ind],labels[test_ind]

    gcv.best_estimator_.predict(x_test)

Run Code Online (Sandbox Code Playgroud)

下面的模型给出了更合理但低分

cv = StratifiedShuffleSplit(n_splits = 100, test_size = 0.2, random_state = 42)
gcv = GridSearchCV(pipe, clf_params,cv=cv)

gcv.fit(features,labels) ---> with the full dataset

for train_ind, test_ind in cv.split(features,labels):
     x_train, x_test = features[train_ind], features[test_ind]
     y_train, y_test = labels[train_ind],labels[test_ind]

     gcv.best_estimator_.fit(x_train,y_train)
     gcv.best_estimator_.predict(x_test)

Run Code Online (Sandbox Code Playgroud)