我可以使用XGBoost增强其他模型（例如，朴素贝叶斯，随机森林）吗？

Question

我可以使用XGBoost增强其他模型（例如，朴素贝叶斯，随机森林）吗？

Jan*_*ane 1 python machine-learning boosting xgboost

我正在从事欺诈分析项目，因此需要一些帮助。以前，我使用SAS Enterprise Miner来了解有关增强/集成技术的更多信息，并且我了解到增强可以帮助改善模型的性能。

目前，我的小组已在Python上完成以下模型：朴素贝叶斯，随机森林和神经网络我们想使用XGBoost来改善F1得分。我不确定这是否可行，因为我只遇到过有关如何单独执行XGBoost或Naive Bayes的教程。

我正在寻找一个教程，他们将向您展示如何创建朴素贝叶斯模型，然后使用Boosting。此后，我们可以比较指标是否有提升，以查看指标是否有所改善。我是机器学习的新手，所以我可能对这个概念不对。

我曾考虑过替换XGBoost中的值，但不确定要更改哪个值，或者甚至不能以这种方式工作。

朴素贝叶斯

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X_sm,y_sm, test_size = 0.2, random_state=0)

from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import confusion_matrix, confusion_matrix, accuracy_score, f1_score, precision_score, recall_score

nb = GaussianNB()
nb.fit(X_train, y_train)
nb_pred = nb.predict(X_test)

Run Code Online (Sandbox Code Playgroud)

XGBoost

from sklearn.model_selection import train_test_split
import xgboost as xgb
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X_sm,y_sm, test_size = 0.2, random_state=0)
model = XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=0.9, gamma=0,
learning_rate=0.1, max_delta_step=0, max_depth=10,
min_child_weight=1, missing=None, n_estimators=500, n_jobs=-1,
nthread=None, objective='binary:logistic', random_state=0,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
silent=None, subsample=0.9, verbosity=0)

model.fit(X_train, y_train)
y_pred = model.predict(X_test)
predictions = [round(value) for value in y_pred]

Run Code Online (Sandbox Code Playgroud)

Answer 1

des*_*aut 5

从理论上讲，使用scikit-learn's可以轻松而直接地提高任何（基本）分类器AdaBoostClassifier。例如，对于朴素贝叶斯分类器，应为：

from sklearn.ensemble import AdaBoostClassifier
from sklearn.naive_bayes import GaussianNB

nb = GaussianNB()
model = AdaBoostClassifier(base_estimator=nb, n_estimators=10)
model.fit(X_train, y_train)

Run Code Online (Sandbox Code Playgroud)

等等。

在实践中，我们从来没有使用的Naive Bayes或神经网络作为基分类为提高（更不用说随机森林，它们本身是一个整体法）。

使用决策树（DT）作为基础分类器（更具体地讲，决策树桩，即深度仅为1的DT）构想了Adaboost（以及后来衍生的类似增强方法，例如GBM和XGBoost ）；有充分的理由说明为什么今天仍然如此，如果您base_classifier在AdaBoostClassifier上面的scikit-learn中未明确指定参数，则该参数将假定值为DecisionTreeClassifier(max_depth=1)，即决策树桩。

DT非常适合此类集合，因为它们本质上是不稳定的分类器，而其他提到的算法则不是这种情况，因此，当用作增强算法的基本分类器时，DT 不会提供任何东西。

归档时间：	6 年，1 月前
查看次数：	63 次
最近记录：	6 年，1 月前