在 scikit learn 中预训练模型(分类器)

Tes*_*est 7 python model classification scikit-learn pre-trained-model

我想预训练一个模型,然后用另一个模型训练它。

我有模型Decision Tree Classifer,然后我想用 model 进一步训练它LGBM Classifier。在 scikit learn 中是否有可能做到这一点?我已经读过这篇关于它的文章https://datascience.stackexchange.com/questions/28512/train-new-data-to-pre-trained-model。。帖子里说

根据官方文档,多次调用 fit() 将会覆盖之前的 fit() 学到的内容

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1) 

# Train Decision Tree Classifer
clf = DecisionTreeClassifier()
clf = clf.fit(X_train,y_train)

lgbm = lgb.LGBMClassifier()
lgbm = lgbm.fit(X_train,y_train)

#Predict the response for test dataset
y_pred = lgbm.predict(X_test)
Run Code Online (Sandbox Code Playgroud)

fer*_*rdy 1

不幸的是,目前这是不可能的。根据https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.LGBMClassifier.html?highlight=init_model的文档,如果模型来自 lightgbm,则可以继续训练模型。

我确实尝试过这个设置:

# dtc
dtc_model = DecisionTreeClassifier()
dtc_model = dtc_model.fit(X_train, y_train)
    
# save
dtc_fn = 'dtc.pickle.db'
pickle.dump(dtc_model, open(dtc_fn, 'wb'))
    
# lgbm
lgbm_model = LGBMClassifier()
lgbm_model.fit(X_train_2, y_train_2, init_model=dtc_fn)
Run Code Online (Sandbox Code Playgroud)

我得到:

LightGBMError: Unknown model format or submodel type in model file dtc.pickle.db
Run Code Online (Sandbox Code Playgroud)