如何在python中使用交叉验证执行GridSearchCV

EmJ*_*EmJ 5 python machine-learning scikit-learn cross-validation

我正在使用执行超参数调整，RandomForest如下所示GridSearchCV。

X = np.array(df[features]) #all features
y = np.array(df['gold_standard']) #labels

x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

param_grid = { 
    'n_estimators': [200, 500],
    'max_features': ['auto', 'sqrt', 'log2'],
    'max_depth' : [4,5,6,7,8],
    'criterion' :['gini', 'entropy']
}
CV_rfc = GridSearchCV(estimator=rfc, param_grid=param_grid, cv= 5)
CV_rfc.fit(x_train, y_train)
print(CV_rfc.best_params_)

Run Code Online (Sandbox Code Playgroud)

我得到的结果如下。

{'criterion': 'gini', 'max_depth': 6, 'max_features': 'auto', 'n_estimators': 200}

Run Code Online (Sandbox Code Playgroud)

之后，我将调整后的参数重新应用x_test如下。

rfc=RandomForestClassifier(random_state=42, criterion ='gini', max_depth= 6, max_features = 'auto', n_estimators = 200, class_weight = 'balanced')
rfc.fit(x_train, y_train)
pred=rfc.predict(x_test)
print(precision_recall_fscore_support(y_test,pred))
print(roc_auc_score(y_test,pred))

Run Code Online (Sandbox Code Playgroud)

不过，我还是不清楚如何使用GridSearchCV与10-fold cross validation（即不仅适用调谐参数x_test）。即像下面的东西。

kf = StratifiedKFold(n_splits=10)
for fold, (train_index, test_index) in enumerate(kf.split(X, y), 1):
    X_train = X[train_index]
    y_train = y[train_index]
    X_test = X[test_index]
    y_test = y[test_index]

Run Code Online (Sandbox Code Playgroud)

要么

因为GridSearchCV使用crossvalidation我们可以使用所有X和y并获得最好的结果作为最终结果？

如果需要，我很乐意提供更多详细信息。

归档时间：	6 年，9 月前
查看次数：	105 次
最近记录：	6 年，9 月前