gwa*_*nim 8 python machine-learning xgboost
我最近使用scikit-learn RandomForestRegressor模型开发了一个功能齐全的随机森林回归软件,现在我有兴趣将其性能与其他库进行比较。所以我找到了一个用于 XGBoost 随机森林回归的scikit-learn API,我用 X 特征和全零的 Y 数据集做了一个小的软件测试。
from numpy import array
from xgboost import XGBRFRegressor
from sklearn.ensemble import RandomForestRegressor
tree_number = 100
depth = 10
jobs = 1
dimension = 19
sk_VAL = RandomForestRegressor(n_estimators=tree_number, max_depth=depth, random_state=42,
n_jobs=jobs)
xgb_VAL = XGBRFRegressor(n_estimators=tree_number, max_depth=depth, random_state=42,
n_jobs=jobs)
dataset = array([[0.0] * dimension, [0.0] * dimension])
y_val = array([0.0, 0.0])
sk_VAL.fit(dataset, y_val)
xgb_VAL.fit(dataset, y_val)
sk_predict = sk_VAL.predict(array([[0.0] * dimension]))
xgb_predict = xgb_VAL.predict(array([[0.0] * dimension]))
print("sk_prediction = {}\nxgb_prediction = {}".format(sk_predict, xgb_predict))
Run Code Online (Sandbox Code Playgroud)
令人惊讶的是,xgb_VAL 模型的输入样本全为零的预测结果是非零的:
sk_prediction = [0.]
xgb_prediction = [0.02500369]
Run Code Online (Sandbox Code Playgroud)
我的评估或构建比较中的错误是什么?
XGBoost 似乎在模型中包含了全局偏差,并且该偏差固定为 0.5,而不是根据输入数据进行计算。这已在 XGBoost GitHub 存储库中作为问题提出(请参阅https://github.com/dmlc/xgboost/issues/799)。相应的超参数是base_score
,如果您将其设置为零,您的模型将按预期预测为零。
from numpy import array
from xgboost import XGBRFRegressor
from sklearn.ensemble import RandomForestRegressor
tree_number = 100
depth = 10
jobs = 1
dimension = 19
sk_VAL = RandomForestRegressor(n_estimators=tree_number, max_depth=depth, random_state=42, n_jobs=jobs)
xgb_VAL = XGBRFRegressor(n_estimators=tree_number, max_depth=depth, base_score=0, random_state=42, n_jobs=jobs)
dataset = array([[0.0] * dimension, [0.0] * dimension])
y_val = array([0.0, 0.0])
sk_VAL.fit(dataset, y_val)
xgb_VAL.fit(dataset, y_val)
sk_predict = sk_VAL.predict(array([[0.0] * dimension]))
xgb_predict = xgb_VAL.predict(array([[0.0] * dimension]))
print("sk_prediction = {}\nxgb_prediction = {}".format(sk_predict, xgb_predict))
#sk_prediction = [0.]
#xgb_prediction = [0.]
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
180 次 |
最近记录: |