相关疑难解决方法(0)

GridSearchCV是否执行交叉验证？

我目前正在研究一个问题,该问题比较了同一数据集上三种不同的机器学习算法性能.我将数据集划分为70/30个训练/测试集,然后使用GridSearchCV和网格搜索每个算法的最佳参数X_train, y_train.

第一个问题,我想在训练集上进行网格搜索还是假设在整个数据集上？

第二个问题,我知道GridSearchCV在其实现中使用了K-fold,这是否意味着如果我X_train, y_train在GridSearchCV中比较的所有三种算法都使用了相同的交叉验证？

任何答案都将不胜感激,谢谢.

python machine-learning scikit-learn cross-validation grid-search

kev*_*inH

2018 12-29

16
推荐指数

2
解决办法

3783
查看次数

Scikit-learn 上的嵌套交叉验证示例

我正在努力研究 Sklearn中嵌套与非嵌套 CV的示例。我检查了多个答案，但我仍然对这个例子感到困惑。据我所知，嵌套 CV 旨在使用不同的数据子集来选择分类器的最佳参数（例如 SVM 中的 C）并验证其性能。因此，从数据集 X 中，外部10 倍 CV（为简单起见，n=10）创建了 10 个训练集和 10 个测试集：

(Tr0, Te0),..., (Tr0, Te9)

Run Code Online (Sandbox Code Playgroud)

然后，内部10-CV 将每个外部训练集拆分为 10 个训练集和 10 个测试集：

From Tr0: (Tr0_0,Te_0_0), ... , (Tr0_9,Te0_9)
From Tr9: (Tr9_0,Te_9_0), ... , (Tr9_9,Te9_9)

Run Code Online (Sandbox Code Playgroud)

现在，使用内部 CV，我们可以找到每个外部训练集的最佳C值。这是通过使用内部 CV测试C 的所有可能值来完成的。为该特定外部训练集选择提供最高性能（例如准确度）的值。最后，发现每个外部训练集的最佳C值后，我们可以使用外部测试集计算无偏精度。通过这个过程，用于识别最佳参数（即C）的样本不用于计算分类器的性能，因此我们有一个完全无偏的验证。

Sklearn 页面中提供的示例是：

inner_cv = KFold(n_splits=4, shuffle=True, random_state=i)
outer_cv = KFold(n_splits=4, shuffle=True, random_state=i)

# Non_nested parameter search and scoring
clf = GridSearchCV(estimator=svm, …

Run Code Online (Sandbox Code Playgroud)

python nested scikit-learn cross-validation grid-search

NCL*_*NCL

2017 10-06

10
推荐指数

0
解决办法

7734
查看次数

在GridSearchCV中明确指定测试/训练集

我cv对sklearn的参数有疑问GridSearchCV。

我正在处理具有时间成分的数据，因此我认为在KFold交叉验证中进行随机混洗似乎并不明智。

取而代之的是，我想在中明确指定训练，验证和测试数据的临界值GridSearchCV。我可以这样做吗？

为了更好地阐明问题，以下是我手动解决的方法。

import numpy as np
import pandas as pd
from sklearn.linear_model import Ridge
np.random.seed(444)

index = pd.date_range('2014', periods=60, freq='M')
X, y = make_regression(n_samples=60, n_features=3, random_state=444, noise=90.)
X = pd.DataFrame(X, index=index, columns=list('abc'))
y = pd.Series(y, index=index, name='y')

# Train on the first 30 samples, validate on the next 10, test on
#     the final 10.
X_train, X_val, X_test = np.array_split(X, [35, 50])
y_train, y_val, y_test = np.array_split(y, [35, 50])

param_grid = {'alpha': …

Run Code Online (Sandbox Code Playgroud)

python scikit-learn grid-search

Bra*_*mon

2018 01-23

5
推荐指数

2
解决办法

4950
查看次数

SkLearn 调用嵌套交叉验证

我正在阅读有关嵌套交叉验证的 SkLearn 文档，我在此SkLearn 页面上发现了以下嵌套交叉验证示例：

from sklearn.datasets import load_iris
from matplotlib import pyplot as plt
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV, cross_val_score, KFold
import numpy as np

print(__doc__)

# Number of random trials
NUM_TRIALS = 30

# Load the dataset
iris = load_iris()
X_iris = iris.data
y_iris = iris.target

# Set up possible values of parameters to optimize over
p_grid = {"C": [1, 10, 100],
          "gamma": [.01, .1]}

# We will use a Support Vector Classifier with …

Run Code Online (Sandbox Code Playgroud)

python machine-learning scikit-learn cross-validation

Poe*_*dit

2018 09-17

5
推荐指数

0
解决办法

833
查看次数

嵌套交叉验证：cross_validate 如何将 GridSearchCV 作为其输入估计器处理？

以下代码cross_validate与结合以GridSearchCV对 iris 数据集上的 SVC 执行嵌套交叉验证。

（以下文档页面的修改示例：https : //scikit-learn.org/stable/auto_examples/model_selection/plot_nested_cross_validation_iris.html#sphx-glr-auto-examples-model-selection-plot-nested-cross-validation-iris -py .)

from sklearn.datasets import load_iris
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV, cross_validate, KFold
import numpy as np
np.set_printoptions(precision=2)

# Load the dataset
iris = load_iris()
X_iris = iris.data
y_iris = iris.target

# Set up possible values of parameters to optimize over
p_grid = {"C": [1, 10],
          "gamma": [.01, .1]}

# We will use a Support Vector Classifier with "rbf" kernel
svm = SVC(kernel="rbf")

# …

Run Code Online (Sandbox Code Playgroud)

python nested python-3.x scikit-learn cross-validation

zwi*_*uta

2019 05-16

5
推荐指数

1
解决办法

781
查看次数

如何访问 Scikit Learn 嵌套交叉验证分数

我正在使用 python，我想将嵌套交叉验证与 scikit learn 一起使用。我找到了一个很好的例子：

NUM_TRIALS = 30
non_nested_scores = np.zeros(NUM_TRIALS)
nested_scores = np.zeros(NUM_TRIALS)
# Choose cross-validation techniques for the inner and outer loops,
# independently of the dataset.
# E.g "LabelKFold", "LeaveOneOut", "LeaveOneLabelOut", etc.
inner_cv = KFold(n_splits=4, shuffle=True, random_state=i)
outer_cv = KFold(n_splits=4, shuffle=True, random_state=i)

# Non_nested parameter search and scoring
clf = GridSearchCV(estimator=svr, param_grid=p_grid, cv=inner_cv)
clf.fit(X_iris, y_iris)
non_nested_scores[i] = clf.best_score_

# Nested CV with parameter optimization
nested_score = cross_val_score(clf, X=X_iris, y=y_iris, cv=outer_cv)
nested_scores[i] = nested_score.mean()

Run Code Online (Sandbox Code Playgroud)

如何访问嵌套交叉验证中的最佳参数集以及所有参数集（及其相应的分数）？

python machine-learning scikit-learn cross-validation grid-search

mac*_*ery

2018 08-25

4
推荐指数

1
解决办法

2963
查看次数

TypeError:'ShuffleSplit'对象不可迭代

我正在使用ShuffleSplit来重新排列数据,但我发现存在错误

TypeError                                 Traceback (most recent call last)
<ipython-input-36-192f7c286a58> in <module>()
      1 # Fit the training data to the model using grid search
----> 2 reg = fit_model(X_train, y_train)
      3 
      4 # Produce the value for 'max_depth'
      5 print "Parameter 'max_depth' is {} for the optimal model.".format(reg.get_params()['max_depth'])

<ipython-input-34-18b2799e585c> in fit_model(X, y)
     32 
     33     # Fit the grid search object to the data to compute the optimal model
---> 34     grid = grid.fit(X, y)
     35 
     36     # Return the optimal …

Run Code Online (Sandbox Code Playgroud)

python scikit-learn grid-search

Goi*_*Way

2018 08-25

2
推荐指数

1
解决办法

3240
查看次数