XGBClassifier上的交叉验证,用于python中的多类分类

Pie*_*chi 5 python classification cross-validation xgboost

我正在尝试使用以下代码在XGBClassifier上执行交叉验证以获取多类别分类问题,该代码改编自http://www.analyticsvidhya.com/blog/2016/03/complete-guide-parameter-tuning-xgboost -with码-蟒/

import numpy as np
import pandas as pd
import xgboost as xgb
from xgboost.sklearn import  XGBClassifier
from sklearn.preprocessing import LabelEncoder
from sklearn import cross_validation, metrics
from sklearn.grid_search import GridSearchCV


def modelFit(alg, X, y, useTrainCV=True, cvFolds=5, early_stopping_rounds=50):
    if useTrainCV:
        xgbParams = alg.get_xgb_params()
        xgTrain = xgb.DMatrix(X, label=y)
        cvresult = xgb.cv(xgbParams,
                      xgTrain,
                      num_boost_round=alg.get_params()['n_estimators'],
                      nfold=cvFolds,
                      stratified=True,
                      metrics={'mlogloss'},
                      early_stopping_rounds=early_stopping_rounds,
                      seed=0,
                      callbacks=[xgb.callback.print_evaluation(show_stdv=False),                                                               xgb.callback.early_stop(3)])

        print cvresult
        alg.set_params(n_estimators=cvresult.shape[0])

    # Fit the algorithm
    alg.fit(X, y, eval_metric='mlogloss')

    # Predict
    dtrainPredictions = alg.predict(X)
    dtrainPredProb = alg.predict_proba(X)

    # Print model report:
    print "\nModel Report"
    print "Classification report: \n"
    print(classification_report(y_val, y_val_pred))
    print "Accuracy : %.4g" % metrics.accuracy_score(y, dtrainPredictions)
    print "Log Loss Score (Train): %f" % metrics.log_loss(y, dtrainPredProb)
    feat_imp = pd.Series(alg.booster().get_fscore()).sort_values(ascending=False)
    feat_imp.plot(kind='bar', title='Feature Importances')
    plt.ylabel('Feature Importance Score')


# 1) Read training set
print('>> Read training set')
train = pd.read_csv(trainFile)

# 2) Extract target attribute and convert to numeric
print('>> Preprocessing')
y_train = train['OutcomeType'].values
le_y = LabelEncoder()
y_train = le_y.fit_transform(y_train)
train.drop('OutcomeType', axis=1, inplace=True)

# 4) Extract features and target from training set
X_train = train.values

# 5) First classifier
xgb = XGBClassifier(learning_rate =0.1,
                    n_estimators=1000,
                    max_depth=5,
                    min_child_weight=1,
                    gamma=0,
                    subsample=0.8,
                    colsample_bytree=0.8,
                    scale_pos_weight=1,
                    objective='multi:softprob',
                    seed=27)

modelFit(xgb, X_train, y_train)
Run Code Online (Sandbox Code Playgroud)

其中y_train包含0到4之间的标签.但是,当我运行此代码时,我从xgb.cv函数中得到以下错误xgboost.core.XGBoostError: value 0for Parameter num_class should be greater equal to 1.在XGBoost文档中,我读到在多类情况下,xgb从目标向量中的标签中推断出类的数量,所以我不明白发生了什么.

小智 3

您必须将参数 \xe2\x80\x98num_class\xe2\x80\x99 添加到 xgb_param 字典中。参数描述和您上面提供的链接的评论中也提到了这一点。

\n

  • @LetsPlayYahtzee`没有解释如何设置参数`这样:`xgb_param['num_class'] = k #k = 类数`这就是原始问题的答案。`与sklearn相关的错误` **不是**OP问题**中的错误,而是**另一个人的评论**中的错误。 (4认同)
  • 这是完全错误的答案!API (http://xgboost.readthedocs.io/en/latest//python/python_api.html#module-xgboost.sklearn) 表示 {num_class} 不是参数的一部分!他是怎么设置的? (2认同)