GridSearchCV 随机森林回归器调整最佳参数

Question

GridSearchCV 随机森林回归器调整最佳参数

Amb*_*us9 7 python random-forest grid-search

我想为Random Forest Regressor改进这个GridSearchCV的参数。

def Grid_Search_CV_RFR(X_train, y_train): from sklearn.model_selection import GridSearchCV from sklearn.model_selection import ShuffleSplit from sklearn.ensemble import RandomForestRegressor estimator = RandomForestRegressor() param_grid = { "n_estimators" : [10,20,30], "max_features" : ["auto", "sqrt", "log2"], "min_samples_split" : [2,4,8], "bootstrap": [True, False], } grid = GridSearchCV(estimator, param_grid, n_jobs=-1, cv=5) grid.fit(X_train, y_train) return grid.best_score_ , grid.best_params_ def RFR(X_train, X_test, y_train, y_test, best_params): from sklearn.ensemble import RandomForestRegressor estimator = RandomForestRegressor(n_jobs=-1).set_params(**best_params) estimator.fit(X_train,y_train) y_predict = estimator.predict(X_test) print "R2 score:",r2(y_test,y_predict) return y_test,y_predict def splitter_v2(tab,y_indicator): from sklearn.model_selection import train_test_split # Asignamos X e y, eliminando la columna y en X X = correlacion(tab,y_indicator) y = tab[:,y_indicator] # Separamos Train y Test respectivamente para X e y X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2) return X_train, X_test, y_train, y_test
Run Code Online (Sandbox Code Playgroud)
我用这个代码使用了这个函数5 次：

for i in range(5): print "Loop: " , i print "--------------" X_train, X_test, y_train, y_test = splitter_v2(tabla,1) best_score, best_params = Grid_Search_CV_RFR(X_train, y_train) y_test , y_predict = RFR(X_train, X_test, y_train, y_test, best_params) print "Best Score:" ,best_score print "Best params:",best_params
Run Code Online (Sandbox Code Playgroud)
这是结果：

Loop: 0 -------------- R2 score: 0.900071279487 Best Score: 0.61802821072 Best params: {'max_features': 'log2', 'min_samples_split': 2, 'bootstrap': False, 'n_estimators': 10} Loop: 1 -------------- R2 score: 0.993462885564 Best Score: 0.671309726329 Best params: {'max_features': 'log2', 'min_samples_split': 4, 'bootstrap': False, 'n_estimators': 10} Loop: 2 -------------- R2 score: -0.181378339338 Best Score: -30.9012120698 Best params: {'max_features': 'log2', 'min_samples_split': 4, 'bootstrap': True, 'n_estimators': 20} Loop: 3 -------------- R2 score: 0.750116663033 Best Score: 0.71472985391 Best params: {'max_features': 'log2', 'min_samples_split': 4, 'bootstrap': False, 'n_estimators': 30} Loop: 4 -------------- R2 score: 0.692075744759 Best Score: 0.715012972471 Best params: {'max_features': 'sqrt', 'min_samples_split': 2, 'bootstrap': True, 'n_estimators': 30}
Run Code Online (Sandbox Code Playgroud)
¿为什么我在R2 分数中得到不同的结果？, ¿这是因为我选择了CV=5 ?, ¿这是因为我没有在我的RandomForestRegressor()上确定random_state=0？

Answer 1

小智 0

for model in models:\n    m = str(model)\n    print(m)\n    # \xd0\x9d\xd0\xb0\xd1\x88 Pipeline\n    text_clf = Pipeline([('vect', CountVectorizer()),\n                      ('tfidf', TfidfTransformer()),\n                      ('clf', model),\n    ])\n    # \xd0\x9e\xd0\xb1\xd1\x83\xd1\x87\xd0\xb5\xd0\xbd\xd0\xb8\xd0\xb5    \n    text_clf = text_clf.fit(X_train.to_numpy(), y_train)\n    # \xd0\x9f\xd1\x80\xd0\xb5\xd0\xb4\xd1\x81\xd0\xba\xd0\xb0\xd0\xb7\xd0\xb0\xd0\xbd\xd0\xb8\xd0\xb5\n    pred = text_clf.predict(X_test)\n    # \xd0\x9c\xd0\xb5\xd1\x82\xd1\x80\xd0\xb8\xd0\xba\xd0\xb8\n    print('accuracy_score', accuracy_score(pred, y_test))\n    print('recall_score', recall_score(pred, y_test, average="macro"))\n    print('f1_score', f1_score(pred, y_test, average="macro"))\n\n#lr\nC = [1,10,25,50,100,150]\nsolver = ['newton-cg', 'sag', 'saga', 'lbfgs']\n\n# rfc \nn_estimators = [50,100,200,300,500]\nmax_features = ["auto", "sqrt", "log2"]\nmax_depth = [3,6]\n\n# Knc \nn_neighbors=[5,10,15,20]\np=[1,2]\n

Run Code Online (Sandbox Code Playgroud)\n

归档时间：	8 年，6 月前
查看次数：	14118 次
最近记录：	4 年，3 月前