我应该使用什么而不是Bootstrap?

Gin*_*ger 9 python scikit-learn

当我运行此代码时:

from sklearn import cross_validation
bs = cross_validation.Bootstrap(9, random_state=0)
Run Code Online (Sandbox Code Playgroud)

我收到了这个弃用警告:

C:\Anaconda\envs\p33\lib\site-packages\sklearn\cross_validation.py:684: DeprecationWarning: Bootstrap will no longer be supported as a cross-validation method as of version 0.15 and will be removed in 0.17
  "will be removed in 0.17", DeprecationWarning)
Run Code Online (Sandbox Code Playgroud)

我应该使用什么而不是引导程序?

Mat*_*all 5

scikit-learn 0.15发行说明中,在"API更改摘要"下

源代码本身:

# See, e.g., http://youtu.be/BzHz0J9a6k0?t=9m38s for a motivation
# behind this deprecation
warnings.warn("Bootstrap will no longer be supported as a " +
              "cross-validation method as of version 0.15 and " +
              "will be removed in 0.17", DeprecationWarning)
Run Code Online (Sandbox Code Playgroud)

  • 我编辑了这个答案以提供讲座链接.但是,我仍然不认为这解决了OP背后的"原因".Bootstrap是处理自由度问题的好方法.此外,使用out-of-bootstrap数据进行验证类似于使用交叉验证.所以,我也不明白为什么Bootstrap被弃用了.它确实有一个真实而重要的用例,恕我直言. (5认同)

小智 5

您可以使用BaggingClassifier

bag = BaggingClassifier(base_estimator=your_estimator, 
                        n_estimators=100,
                        max_samples=1.0,
                        bootstrap=True,
                        n_jobs=-1)
bag.fit(X, y)
recalls = []
for estimator, samples in zip(bag.estimators_, bag.estimators_samples_):
    # compute predictions on out-of-bag samples
    mask = ~samples
    y_pred = estimator.predict(X[mask])
    # compute some statistic
    recalls.append(recall(y[mask], y_pred))
# Do something with stats, e.g. find confidence interval
print(np.percentile(recalls, [2.5, 97.5]))
Run Code Online (Sandbox Code Playgroud)