Kic*_*icK 5 python pipeline machine-learning xgboost
我正在将XGBRegressor与 Pipeline 一起使用。管道包含预处理步骤和模型(XGBRegressor)。
以下是完整的预处理步骤。(我已经定义了numeric_cols和cat_cols)
numerical_transfer = SimpleImputer()
cat_transfer = Pipeline(steps = [
('imputer', SimpleImputer(strategy = 'most_frequent')),
('onehot', OneHotEncoder(handle_unknown = 'ignore'))
])
preprocessor = ColumnTransformer(
transformers = [
('num', numerical_transfer, numeric_cols),
('cat', cat_transfer, cat_cols)
])
Run Code Online (Sandbox Code Playgroud)
最终的管道是
my_model = Pipeline(steps = [('preprocessor', preprocessor), ('model', model)])
当我尝试在不使用Early_stopping_rounds 的情况下进行拟合时,代码工作正常。
(my_model.fit(X_train, y_train))
但是当我使用如下所示的Early_stopping_rounds时,我收到错误。
my_model.fit(X_train, y_train, model__early_stopping_rounds=5, model__eval_metric = "mae", model__eval_set=[(X_valid, y_valid)])
我收到错误:
model__eval_set=[(X_valid, y_valid)]) and the error is
ValueError: DataFrame.dtypes for data must be int, float or bool.
Did not expect the data types in fields MSZoning, Street, Alley, LotShape, LandContour, Utilities, LotConfig, LandSlope, Condition1, Condition2, BldgType, HouseStyle, RoofStyle, RoofMatl, MasVnrType, ExterQual, ExterCond, Foundation, BsmtQual, BsmtCond, BsmtExposure, BsmtFinType1, BsmtFinType2, Heating, HeatingQC, CentralAir, Electrical, KitchenQual, Functional, FireplaceQu, GarageType, GarageFinish, GarageQual, GarageCond, PavedDrive, PoolQC, Fence, MiscFeature, SaleType, SaleCondition
Run Code Online (Sandbox Code Playgroud)
这是否意味着我应该在应用于 my_model.fit() 之前预处理 X_valid 或者我做错了什么?
如果问题是我们需要在应用 fit() 之前预处理 X_valid 如何使用我上面定义的预处理器来做到这一点?
编辑:我尝试在没有 Pipeline 的情况下预处理 X_valid,但出现错误,提示功能不匹配。
问题是管道不适合 eval_set。所以,正如你所说,你需要预处理X_valid。要做到这一点,最简单的方法是使用没有“模型”步骤的管道。在安装管道之前使用以下代码:
# Make a copy to avoid changing original data
X_valid_eval=X_valid.copy()
# Remove the model from pipeline
eval_set_pipe = Pipeline(steps = [('preprocessor', preprocessor)])
# fit transform X_valid.copy()
X_valid_eval = eval_set_pipe.fit(X_train, y_train).transform (X_valid_eval)
Run Code Online (Sandbox Code Playgroud)
然后在更改 model__eval_set 后适合您的管道,如下所示:
my_model.fit(X_train, y_train, model__early_stopping_rounds=5, model__eval_metric = "mae", model__eval_set=[(X_valid_eval, y_valid)])
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
1052 次 |
| 最近记录: |