Yur*_*lin 5 python random-forest scikit-learn
尝试对 RandomForestClassifier 进行超参数优化。看起来 RandomizedSearchCV 比一组等效的 RandomForestClassifier 运行慢 14 倍。
下面提供的两个示例使用相同的训练数据和相同的折叠数 (6)。示例#1 是经典的RandomForestClassifier()健身跑步。示例 #2 是RandomizedSearchCV()在 1 点 random_grid 上运行。
运行时间:1 分 8 秒与 14 分 13 秒。我缺少什么?
%%time
n_fold = 6
time_split = TimeSeriesSplit(n_splits=n_fold)
clf = RandomForestClassifier()
cv_scores = cross_val_score(clf, X, y, cv=time_split, scoring='roc_auc', n_jobs=-1)
# CPU times: user 410 ms, sys: 868 ms, total: 1.28 s
# Wall time: 1min 8s
Run Code Online (Sandbox Code Playgroud)
%%time
print(random_grid)
n_fold = 6
rf = RandomForestClassifier()
rf_random = RandomizedSearchCV(estimator = rf, param_distributions = random_grid, n_iter = 6, scoring = 'roc_auc', cv = n_fold, verbose=True, random_state=42, n_jobs = -1)
rf_random.fit(X, y)
best_random = rf_random.best_estimator_
# {'n_estimators': [200], 'max_features': ['auto'], 'max_depth': [10], 'min_samples_split': [5], 'min_samples_leaf': [2], 'bootstrap': [True]}
# Fitting 6 folds for each of 1 candidates, totalling 6 fits
# CPU times: user 5min 15s, sys: 4.73 s, total: 5min 20s
# Wall time: 14min 13s
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
2109 次 |
| 最近记录: |