Sklearn,gridsearch:如何在执行过程中打印出进度?

dou*_*bts 58 python logging scikit-learn

我使用GridSearchsklearn优化分类的参数.有很多数据,因此整个优化过程需要一段时间:超过一天.我想在执行期间观察已经尝试过的参数组合的性能.可能吗?

Dav*_*idS 76

verbose参数设置GridSearchCV为正数(数字越大,您将获得的详细信息越多).例如:

GridSearchCV(clf, param_grid, cv=cv, scoring='accuracy', verbose=10)  
Run Code Online (Sandbox Code Playgroud)

  • 只是添加:如果您使用的是IPython Notebook,则输出位于IPython终端窗口中,而不是在交互式会话中. (19认同)
  • 该参数的实际最高有意义值是多少?文档仅提到“越高,消息越多”。那么,我们能达到多高才能收到更多消息呢? (6认同)
  • 正如 Arturo 在下面所说,“verbose=2 对于大多数实践来说都是一个不错的选择。它将为每个参数集(包括 CV)返回一行” (4认同)
  • 在我的系统上,我必须设置“n_jobs=1”(默认),否则 JupyterLab 上不会显示任何消息。 (4认同)
  • 最高参数是 verbose=3,这很好,因为它给出了该批次中测试的参数,最重要的是,随着进展,该组特定参数的分数。也许 10 是 2014 年的一个设定,哈哈,但现在不会做超过 3 的事情。 (3认同)

Art*_*uro 21

我只想补充DavidS 的回答

给你一个想法,对于一个非常简单的案例,它是这样的verbose=1

Fitting 10 folds for each of 1 candidates, totalling 10 fits
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done  10 out of  10 | elapsed:  1.2min finished
Run Code Online (Sandbox Code Playgroud)

这就是它的外观verbose=10

Fitting 10 folds for each of 1 candidates, totalling 10 fits
[CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1 
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[CV]  booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1, score=0.637, total=   7.1s
[CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1 
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    7.0s remaining:    0.0s
[CV]  booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1, score=0.630, total=   6.5s
[CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1 
[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:   13.5s remaining:    0.0s
[CV]  booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1, score=0.637, total=   6.5s
[CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1 
[Parallel(n_jobs=1)]: Done   3 out of   3 | elapsed:   20.0s remaining:    0.0s
[CV]  booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1, score=0.637, total=   6.7s
[CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1 
[Parallel(n_jobs=1)]: Done   4 out of   4 | elapsed:   26.7s remaining:    0.0s
[CV]  booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1, score=0.632, total=   7.9s
[CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1 
[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:   34.7s remaining:    0.0s
[CV]  booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1, score=0.622, total=   6.9s
[CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1 
[Parallel(n_jobs=1)]: Done   6 out of   6 | elapsed:   41.6s remaining:    0.0s
[CV]  booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1, score=0.627, total=   7.1s
[CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1 
[Parallel(n_jobs=1)]: Done   7 out of   7 | elapsed:   48.7s remaining:    0.0s
[CV]  booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1, score=0.628, total=   7.2s
[CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1 
[Parallel(n_jobs=1)]: Done   8 out of   8 | elapsed:   55.9s remaining:    0.0s
[CV]  booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1, score=0.640, total=   6.6s
[CV] booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1 
[Parallel(n_jobs=1)]: Done   9 out of   9 | elapsed:  1.0min remaining:    0.0s
[CV]  booster=gblinear, learning_rate=0.0001, max_depth=3, n_estimator=100, subsample=0.1, score=0.629, total=   6.6s
[Parallel(n_jobs=1)]: Done  10 out of  10 | elapsed:  1.2min finished
Run Code Online (Sandbox Code Playgroud)

在我的情况下,verbose=1有诀窍。

  • 在我看来,“verbose=2”对于大多数实践来说都是一个不错的选择。它将为每个参数集(包括 CV)返回一行。 (2认同)

O.r*_*rka 6

看一下这个:

https://pactools.github.io/auto_examples/plot_grid_search.html?highlight=gridsearchcvprogressbar
Run Code Online (Sandbox Code Playgroud)

刚刚找到它,我正在使用它。非常喜欢:

In [1]: GridSearchCVProgressBar
Out[1]: pactools.grid_search.GridSearchCVProgressBar

In [2]:

In [2]: ??GridSearchCVProgressBar
Init signature: GridSearchCVProgressBar(estimator, param_grid, scoring=None, fit_params=None, n_jobs=1, iid=True, refit=True, cv=None, verbose=0, pre_dispatch='2*n_jobs', error_score='raise', return_train_score='warn')
Source:
class GridSearchCVProgressBar(model_selection.GridSearchCV):
    """Monkey patch Parallel to have a progress bar during grid search"""

    def _get_param_iterator(self):
        """Return ParameterGrid instance for the given param_grid"""

        iterator = super(GridSearchCVProgressBar, self)._get_param_iterator()
        iterator = list(iterator)
        n_candidates = len(iterator)

        cv = model_selection._split.check_cv(self.cv, None)
        n_splits = getattr(cv, 'n_splits', 3)
        max_value = n_candidates * n_splits

        class ParallelProgressBar(Parallel):
            def __call__(self, iterable):
                bar = ProgressBar(max_value=max_value, title='GridSearchCV')
                iterable = bar(iterable)
                return super(ParallelProgressBar, self).__call__(iterable)

        # Monkey patch
        model_selection._search.Parallel = ParallelProgressBar

        return iterator
File:           ~/anaconda/envs/python3/lib/python3.6/site-packages/pactools/grid_search.py
Type:           ABCMeta

In [3]: ?GridSearchCVProgressBar
Init signature: GridSearchCVProgressBar(estimator, param_grid, scoring=None, fit_params=None, n_jobs=1, iid=True, refit=True, cv=None, verbose=0, pre_dispatch='2*n_jobs', error_score='raise', return_train_score='warn')
Docstring:      Monkey patch Parallel to have a progress bar during grid search
File:           ~/anaconda/envs/python3/lib/python3.6/site-packages/pactools/grid_search.py
Type:           ABCMeta
Run Code Online (Sandbox Code Playgroud)

  • 这只会打印到std.err,不会显示在Spyder或iPython Notebook中 (2认同)