Orl*_*doL 10 scikit-learn xgboost
我想知道我是否可以在xgboost中进行校准.更具体地讲,确实xgboost来与现有的校准实现像scikit学习,还是有把从xgboost模型为scikit学习的CalibratedClassifierCV一些方法?
据我所知,在sklearn这是常见的程序:
# Train random forest classifier, calibrate on validation data and evaluate
# on test data
clf = RandomForestClassifier(n_estimators=25)
clf.fit(X_train, y_train)
clf_probs = clf.predict_proba(X_test)
sig_clf = CalibratedClassifierCV(clf, method="sigmoid", cv="prefit")
sig_clf.fit(X_valid, y_valid)
sig_clf_probs = sig_clf.predict_proba(X_test)
sig_score = log_loss(y_test, sig_clf_probs)
print "Calibrated score is ",sig_score
Run Code Online (Sandbox Code Playgroud)
如果我将一个xgboost树模型放入CalibratedClassifierCV中,则会抛出错误(当然):
RuntimeError: classifier has no decision_function or predict_proba method.
有没有办法将scikit-learn的优秀校准模块与xgboost集成?
欣赏您富有洞察力的想法!
回答我自己的问题,xgboost GBT可以通过编写包装类与scikit-learn集成,如下例所示.
class XGBoostClassifier():
def __init__(self, num_boost_round=10, **params):
self.clf = None
self.num_boost_round = num_boost_round
self.params = params
self.params.update({'objective': 'multi:softprob'})
def fit(self, X, y, num_boost_round=None):
num_boost_round = num_boost_round or self.num_boost_round
self.label2num = dict((label, i) for i, label in enumerate(sorted(set(y))))
dtrain = xgb.DMatrix(X, label=[self.label2num[label] for label in y])
self.clf = xgb.train(params=self.params, dtrain=dtrain, num_boost_round=num_boost_round)
def predict(self, X):
num2label = dict((i, label)for label, i in self.label2num.items())
Y = self.predict_proba(X)
y = np.argmax(Y, axis=1)
return np.array([num2label[i] for i in y])
def predict_proba(self, X):
dtest = xgb.DMatrix(X)
return self.clf.predict(dtest)
def score(self, X, y):
Y = self.predict_proba(X)
return 1 / logloss(y, Y)
def get_params(self, deep=True):
return self.params
def set_params(self, **params):
if 'num_boost_round' in params:
self.num_boost_round = params.pop('num_boost_round')
if 'objective' in params:
del params['objective']
self.params.update(params)
return self
Run Code Online (Sandbox Code Playgroud)
在此处查看完整示例.
请不要犹豫,提供更聪明的方法!
| 归档时间: |
|
| 查看次数: |
3352 次 |
| 最近记录: |