我正在尝试按照此处的建议使用 LightGBM 作为多输出预测器。我正在尝试预测连续三十天的值。我有一个面板数据集,所以我无法使用传统的时间序列方法。
我有一个非常大的数据集,因此在不提前停止的情况下训练模型需要很长时间。因此,我尝试传递eval_set,early_stopping_rounds和eval_metric参数,如下所示:
from lightgbm import LGBMRegressor
from sklearn.multioutput import MultiOutputRegressor
hyper_params = {
'task': 'train',
'boosting_type': 'gbdt',
'objective': 'regression',
'metric': ['l1','l2'],
'learning_rate': 0.01,
'feature_fraction': 0.9,
'bagging_fraction': 0.7,
'bagging_freq': 10,
'verbose': 0,
"max_depth": 8,
"num_leaves": 128,
"max_bin": 512,
"num_iterations": 10000
}
lgbc_fit_params = {
'early_stopping_rounds' : 300,
'eval_set': (X_test, y_test_array),
'eval_metric':'l1'
}
gbm = lgb.LGBMRegressor(**hyper_params)
regr_multiglb = MultiOutputRegressor(gbm)
regr_multiglb.fit(X_train, y_train_array, **lgbc_fit_params)
Run Code Online (Sandbox Code Playgroud)
这里, 和y_train_array都是形状分别为和 的y_test_array 二维 numpy 数组。(1953395, 30) …
我有这样的数据帧.
import pandas as pd
df = pd.DataFrame({'User':['A','A','A','A','B', 'B'],
'Month':['2017-01-01','2017-03-01','2017-05-01','2017-09-01','2017-01-01','2017-05-01'],
'count':[2,2,2,2,5,5]})
Run Code Online (Sandbox Code Playgroud)
我想填充数据,使它看起来像这样
df = pd.DataFrame({'User':['A','A','A','A','A','A','A','A','A','A','A','A','B','B','B','B','B','B','B','B','B','B','B','B'],
'Month':['2017-01-01','2017-02-01','2017-03-01','2017-04-01','2017-05-01','2017-06-01','2017-07-01','2017-08-01','2017-09-01','2017-10-01','2017-11-01','2017-12-01','2017-01-01','2017-02-01','2017-03-01','2017-04-01','2017-05-01','2017-06-01','2017-07-01','2017-08-01','2017-09-01','2017-10-01','2017-11-01','2017-12-01'],
'count':[2,0,2,0,2,0,0,0,2,0,0,0,5,0,0,0,5,0,0,0,0,0,0,0]})
Run Code Online (Sandbox Code Playgroud)