dok*_*who 7 machine-learning scikit-learn lightgbm
我正在通过 Training API http://lightgbm.readthedocs.io/en/latest/Python-API.html#training-api和 Scikit-learn API http://lightgbm.readthedocs.io/en/latest/试验 LightGBM Python-API.html#scikit-learn-api。
我无法在两个 API 之间做出清晰的映射,如下例中突出显示的那样。基本思想是在 50% 的合成数据集上进行训练。
import numpy as np
import lightgbm as lgbm
# Generate Data Set
xs = np.linspace(0, 10, 100).reshape((-1, 1))
ys = xs**2 + 4*xs + 5.2
ys = ys.reshape((-1,))
# LGBM configuration
alg_conf = {
"num_boost_round":25,
"max_depth" : 3,
"num_leaves" : 31,
'learning_rate' : 0.1,
'boosting_type' : 'gbdt',
'objective' : 'regression_l2',
"early_stopping_rounds": None,
}
# Calling Regressor using scikit-learn API
sk_reg = lgbm.sklearn.LGBMRegressor(
num_leaves=alg_conf["num_leaves"],
n_estimators=alg_conf["num_boost_round"],
max_depth=alg_conf["max_depth"],
learning_rate=alg_conf["learning_rate"],
objective=alg_conf["objective"]
)
sk_reg.fit(xs[::2], ys[::2])
print("Scikit-learn API results")
print(sk_reg.predict(xs[1::2]))
# Calling Regressor using native API
train_dataset = lgbm.Dataset(xs[::2], ys[::2])
lg_reg = lgbm.train(alg_conf.copy(), train_dataset)
print("Native API results")
print(lg_reg.predict(xs[1::2]))
Run Code Online (Sandbox Code Playgroud)
Scikit-learn API results
[ 14.35693851 14.35693851 14.35693851 14.35693851 14.35693851
14.35693851 14.35693851 14.35693851 14.35693851 14.35693851
25.37944751 25.37944751 25.37944751 25.37944751 25.37944751
35.10572544 35.10572544 35.10572544 35.10572544 35.10572544
46.50667974 46.50667974 46.50667974 46.50667974 46.50667974
59.44952419 59.44952419 59.44952419 59.44952419 59.44952419
75.42846332 75.42846332 75.42846332 75.42846332 75.42846332
109.4610814 109.4610814 109.4610814 109.4610814 109.4610814
109.4610814 109.4610814 109.4610814 109.4610814 109.4610814
109.4610814 109.4610814 109.4610814 109.4610814 109.4610814 ]
Native API results
[ 22.55947971 22.55947971 22.55947971 22.55947971 22.55947971
22.55947971 22.55947971 22.55947971 22.55947971 22.55947971
22.55947971 22.55947971 22.55947971 22.55947971 22.55947971
22.55947971 22.55947971 22.55947971 22.55947971 22.55947971
45.33537795 45.33537795 45.33537795 45.33537795 45.33537795
91.6376959 91.6376959 91.6376959 91.6376959 91.6376959
91.6376959 91.6376959 91.6376959 91.6376959 91.6376959
91.6376959 91.6376959 91.6376959 91.6376959 91.6376959
91.6376959 91.6376959 91.6376959 91.6376959 91.6376959
91.6376959 91.6376959 91.6376959 91.6376959 91.6376959 ]
Run Code Online (Sandbox Code Playgroud)
在哪里可以找到两个 API 参数之间的明确等效项?
多谢。
我在 LightGBM GitHub 上得到了答案。分享结果如下:
添加alg_conf "min_child_weight": 1e-3, "min_child_samples": 20)修复了差异:
import numpy as np
import lightgbm as lgbm
# Generate Data Set
xs = np.linspace(0, 10, 100).reshape((-1, 1))
ys = xs**2 + 4*xs + 5.2
ys = ys.reshape((-1,))
# Or you could add to your alg_conf "min_child_weight": 1e-3, "min_child_samples": 20.
# LGBM configuration
alg_conf = {
"num_boost_round":25,
"max_depth" : 3,
"num_leaves" : 31,
'learning_rate' : 0.1,
'boosting_type' : 'gbdt',
'objective' : 'regression_l2',
"early_stopping_rounds": None,
"min_child_weight": 1e-3,
"min_child_samples": 20
}
# Calling Regressor using scikit-learn API
sk_reg = lgbm.sklearn.LGBMRegressor(
num_leaves=alg_conf["num_leaves"],
n_estimators=alg_conf["num_boost_round"],
max_depth=alg_conf["max_depth"],
learning_rate=alg_conf["learning_rate"],
objective=alg_conf["objective"],
min_sum_hessian_in_leaf=alg_conf["min_child_weight"],
min_data_in_leaf=alg_conf["min_child_samples"]
)
sk_reg.fit(xs[::2], ys[::2])
print("Scikit-learn API results")
print(sk_reg.predict(xs[1::2]))
# Calling Regressor using native API
train_dataset = lgbm.Dataset(xs[::2], ys[::2])
lg_reg = lgbm.train(alg_conf.copy(), train_dataset)
print("Native API results")
print(lg_reg.predict(xs[1::2]))
Run Code Online (Sandbox Code Playgroud)
效果很好。
| 归档时间: |
|
| 查看次数: |
5339 次 |
| 最近记录: |