为什么CalibratedClassifierCV表现不如直接分类?

Rya*_*yan 16 python scikit-learn

我注意到,sklearn的新CalibratedClassifierCV似乎弱于直接base_estimatorbase_estimatorGradientBoostingClassifer,(我没有测试其它分类).有趣的是,如果make_classification参数是:

n_features = 10
n_informative = 3
n_classes = 2
Run Code Online (Sandbox Code Playgroud)

然后CalibratedClassifierCV似乎是轻微的表现(日志损失评估).

但是,根据以下分类数据集,CalibratedClassifierCV似乎通常是表现不佳者:

from sklearn.datasets import make_classification
from sklearn import ensemble
from sklearn.calibration import CalibratedClassifierCV
from sklearn.metrics import log_loss
from sklearn import cross_validation
# Build a classification task using 3 informative features

X, y = make_classification(n_samples=1000,
                           n_features=100,
                           n_informative=30,
                           n_redundant=0,
                           n_repeated=0,
                           n_classes=9,
                           random_state=0,
                           shuffle=False)

skf = cross_validation.StratifiedShuffleSplit(y, 5)

for train, test in skf:

    X_train, X_test = X[train], X[test]
    y_train, y_test = y[train], y[test]

    clf = ensemble.GradientBoostingClassifier(n_estimators=100)
    clf_cv = CalibratedClassifierCV(clf, cv=3, method='isotonic')
    clf_cv.fit(X_train, y_train)
    probas_cv = clf_cv.predict_proba(X_test)
    cv_score = log_loss(y_test, probas_cv)

    clf = ensemble.GradientBoostingClassifier(n_estimators=100)
    clf.fit(X_train, y_train)
    probas = clf.predict_proba(X_test)
    clf_score = log_loss(y_test, probas) 

    print 'calibrated score:', cv_score
    print 'direct clf score:', clf_score
    print
Run Code Online (Sandbox Code Playgroud)

一次运行产生:

在此输入图像描述

也许我错过了一些关于如何CalibratedClassifierCV工作的东西,或者我没有正确使用它,但我的印象是,如果有的话,通过分类器CalibratedClassifierCV将导致相对于base_estimator单独的改进性能.

谁能解释这个观察到的表现不佳?

小智 10

概率校准本身需要交叉验证,因此CalibratedClassifierCV列车每折叠一次校准分类器(在这种情况下使用StratifiedKFold),并在调用predict_proba()时从每个分类器获取预测概率的均值.这可能导致对效果的解释.

我的假设是,如果训练集相对于特征和类的数量很小,则每个子分类器的减少的训练集会影响性能,并且集合不会弥补它(或使其变得更糟).此外,GradientBoostingClassifier可能从一开始就提供已经非常好的概率估计,因为其损失函数针对概率估计进行了优化.

如果这是正确的,集合分类器的方式与CalibratedClassifierCV相同,但没有校准应该比单个分类器更差.此外,当使用大量折叠进行校准时,效果应该消失.

为了测试这一点,我扩展了你的脚本以增加折叠数量并包括没有校准的整体分类器,我能够确认我的预测.10倍校准分类器总是比单一分类器表现更好,未校准的整体显着更差.在我的运行中,3倍校准分类器也没有真正比单一分类器更差,所以这可能也是一种不稳定的效果.这些是同一数据集的详细结果:

记录丢失来自交叉验证

这是我的实验代码:

import numpy as np
from sklearn.datasets import make_classification
from sklearn import ensemble
from sklearn.calibration import CalibratedClassifierCV
from sklearn.metrics import log_loss
from sklearn import cross_validation

X, y = make_classification(n_samples=1000,
                           n_features=100,
                           n_informative=30,
                           n_redundant=0,
                           n_repeated=0,
                           n_classes=9,
                           random_state=0,
                           shuffle=False)

skf = cross_validation.StratifiedShuffleSplit(y, 5)

for train, test in skf:

    X_train, X_test = X[train], X[test]
    y_train, y_test = y[train], y[test]

    clf = ensemble.GradientBoostingClassifier(n_estimators=100)
    clf_cv = CalibratedClassifierCV(clf, cv=3, method='isotonic')
    clf_cv.fit(X_train, y_train)
    probas_cv = clf_cv.predict_proba(X_test)
    cv_score = log_loss(y_test, probas_cv)
    print 'calibrated score (3-fold):', cv_score


    clf = ensemble.GradientBoostingClassifier(n_estimators=100)
    clf_cv = CalibratedClassifierCV(clf, cv=10, method='isotonic')
    clf_cv.fit(X_train, y_train)
    probas_cv = clf_cv.predict_proba(X_test)
    cv_score = log_loss(y_test, probas_cv)
    print 'calibrated score (10-fold:)', cv_score

    #Train 3 classifiers and take average probability
    skf2 = cross_validation.StratifiedKFold(y_test, 3)
    probas_list = []
    for sub_train, sub_test in skf2:
        X_sub_train, X_sub_test = X_train[sub_train], X_train[sub_test]
        y_sub_train, y_sub_test = y_train[sub_train], y_train[sub_test]
        clf = ensemble.GradientBoostingClassifier(n_estimators=100)
        clf.fit(X_sub_train, y_sub_train)
        probas_list.append(clf.predict_proba(X_test))
    probas = np.mean(probas_list, axis=0)
    clf_ensemble_score = log_loss(y_test, probas)
    print 'uncalibrated ensemble clf (3-fold) score:', clf_ensemble_score

    clf = ensemble.GradientBoostingClassifier(n_estimators=100)
    clf.fit(X_train, y_train)
    probas = clf.predict_proba(X_test)
    score = log_loss(y_test, probas)
    print 'direct clf score:', score
    print
Run Code Online (Sandbox Code Playgroud)


blu*_*ena 7

等渗回归方法(及其在sklearn中的实现)存在一些问题,使其成为校准的次优选择.

特别:

1)它适合分段常数函数而不是校准函数的平滑变化曲线.

2)交叉验证平均每次折叠得到的模型/校准结果.然而,这些结果中的每一个仍仅适合并且仅在相应的折叠处校准.

通常情况下,更好的选择是SplineCalibratedClassifierCV在类ML-见解包(声明:我是包的作者).包的github仓库在这里.

它具有以下优点:

1)它适合立方平滑样条而不是分段常数函数.

2)它使用整个(交叉验证的)答案集进行校准,并在完整数据集上重新构建基本模型.因此,校准函数和基础模型都在完整数据集上得到有效训练.

您可以在此处此处查看比较示例.

从第一个例子,这里是一个图表,显示训练集(红点),独立测试集(绿色+符号)的分箱概率,以及由ML-insights样条法(蓝线)和等渗法计算的校准-sklearn方法(灰点/线).

样条与等张量校准

我修改了你的代码来比较方法(并提高了例子的数量).它表明样条方法典型的表现更好(我上面链接的例子也是如此).

这是代码和结果:

代码(你必须pip install ml_insights先):

SplineCalibratedClassifierCV

pip install ml_insights

SplineCalibratedClassifierCV

pip install ml_insights

import numpy as np
from sklearn.datasets import make_classification
from sklearn import ensemble
from sklearn.calibration import CalibratedClassifierCV
from sklearn.metrics import log_loss
from sklearn import cross_validation
import ml_insights as mli

X, y = make_classification(n_samples=10000,
                           n_features=100,
                           n_informative=30,
                           n_redundant=0,
                           n_repeated=0,
                           n_classes=9,
                           random_state=0,
                           shuffle=False)

skf = cross_validation.StratifiedShuffleSplit(y, 5)

for train, test in skf:

    X_train, X_test = X[train], X[test]
    y_train, y_test = y[train], y[test]

    clf = ensemble.GradientBoostingClassifier(n_estimators=100)    
    clf_cv_mli = mli.SplineCalibratedClassifierCV(clf, cv=3)
    clf_cv_mli.fit(X_train, y_train)
    probas_cv_mli = clf_cv_mli.predict_proba(X_test)
    cv_score_mli = log_loss(y_test, probas_cv_mli)

    clf = ensemble.GradientBoostingClassifier(n_estimators=100)    
    clf_cv = CalibratedClassifierCV(clf, cv=3, method='isotonic')
    clf_cv.fit(X_train, y_train)
    probas_cv = clf_cv.predict_proba(X_test)
    cv_score = log_loss(y_test, probas_cv)

    clf = ensemble.GradientBoostingClassifier(n_estimators=100)
    clf.fit(X_train, y_train)
    probas = clf.predict_proba(X_test)
    clf_score = log_loss(y_test, probas) 

    clf = ensemble.GradientBoostingClassifier(n_estimators=100)    
    clf_cv_mli = mli.SplineCalibratedClassifierCV(clf, cv=10)
    clf_cv_mli.fit(X_train, y_train)
    probas_cv_mli = clf_cv_mli.predict_proba(X_test)
    cv_score_mli_10 = log_loss(y_test, probas_cv_mli)

    clf = ensemble.GradientBoostingClassifier(n_estimators=100)    
    clf_cv = CalibratedClassifierCV(clf, cv=10, method='isotonic')
    clf_cv.fit(X_train, y_train)
    probas_cv = clf_cv.predict_proba(X_test)
    cv_score_10 = log_loss(y_test, probas_cv)

    print('\nuncalibrated score: {}'.format(clf_score))
    print('\ncalibrated score isotonic-sklearn (3-fold): {}'.format(cv_score))
    print('calibrated score mli (3-fold): {}'.format(cv_score_mli))
    print('\ncalibrated score isotonic-sklearn (10-fold): {}'.format(cv_score_10))
    print('calibrated score mli (10-fold): {}\n'.format(cv_score_mli_10))
Run Code Online (Sandbox Code Playgroud)

结果

SplineCalibratedClassifierCV

pip install ml_insights

SplineCalibratedClassifierCV

pip install ml_insights

SplineCalibratedClassifierCV

pip install ml_insights

SplineCalibratedClassifierCV

pip install ml_insights

SplineCalibratedClassifierCV

pip install ml_insights

SplineCalibratedClassifierCV

pip install ml_insights

SplineCalibratedClassifierCV

pip install ml_insights

SplineCalibratedClassifierCV


use*_*942 5

使用校准分类器的目的是提出比普通分类器表现得更平滑的概率预测。它并不是为了提高基本估计器的性能。

因此,不能保证概率或对数损失相同(相同的邻域,但不相同)。但如果你绘制样本+概率,你可能会看到更好的分布。

主要保留的是高于和低于决策阈值 (0.5) 的#samples。