我有四个类别特征和第五个数字特征 (Var5)。当我尝试以下代码时:
cat_attribs = ['var1','var2','var3','var4']
full_pipeline = ColumnTransformer([('cat', OneHotEncoder(handle_unknown = 'ignore'), cat_attribs)], remainder = 'passthrough')
X_train = full_pipeline.fit_transform(X_train)
model = XGBRegressor(n_estimators=10, max_depth=20, verbosity=2)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
Run Code Online (Sandbox Code Playgroud)
当模型尝试进行预测时,我收到以下错误消息:
ValueError:数据的 DataFrame.dtypes 必须是 int、float、bool 或 categorical。当提供分类类型时,DMatrix 参数
enable_categorical必须设置为True.Var1、Var2、Var3、Var4
有谁知道这里出了什么问题?
如果有帮助,这里是 X_train 数据和 y_train 数据的一个小样本:
Var1 Var2 Var3 Var4 Var5
1507856 JP 2009 6581 OME 325.787218
839624 FR 2018 5783 I_S 11.956326
1395729 BE 2015 6719 OME 42.888565
1971169 DK 2011 3506 RPP 70.094146
1140120 AT 2019 5474 NMM …Run Code Online (Sandbox Code Playgroud) python machine-learning scikit-learn xgboost one-hot-encoding