Над*_*ева 5 machine-learning python-3.x scikit-learn xgboost
我想在管道中为 XGboost 模型实现 GridSearchCV。我有数据预处理器,在代码上面定义,一些网格参数
XGBmodel = XGBRegressor(random_state=0)
pipe = Pipeline(steps=[
('preprocess', preprocessor),
('XGBmodel', XGBmodel)
])
Run Code Online (Sandbox Code Playgroud)
我想传递这些合适的参数
fit_params = {"XGBmodel__eval_set": [(X_valid, y_valid)],
"XGBmodel__early_stopping_rounds": 10,
"XGBmodel__verbose": False}
Run Code Online (Sandbox Code Playgroud)
我正在尝试拟合模型
searchCV = GridSearchCV(pipe, cv=5, param_grid=param_grid, fit_params=fit_params)
searchCV.fit(X_train, y_train)
Run Code Online (Sandbox Code Playgroud)
但我收到以下错误eval_set
:DataFrame.dtypes for data must be int, float or bool
我想这是因为验证数据没有经过预处理,但是当我用谷歌搜索时,我发现到处都是通过这种方式完成的,似乎应该可以工作。此外,我试图找到一种方法来分别为验证数据应用预处理器,但是在不拟合之前的训练数据的情况下无法转换验证数据。
完整代码
columns = num_cols + cat_cols
X_train = X_full_train[columns].copy()
X_valid = X_full_valid[columns].copy()
num_preprocessor = SimpleImputer(strategy = 'mean')
cat_preprocessor = Pipeline(steps=[
('imputer', SimpleImputer(strategy = 'most_frequent')),
('onehot', OneHotEncoder(handle_unknown='ignore'))
])
preprocessor = ColumnTransformer(transformers=[
('num', num_preprocessor, num_cols),
('cat', cat_preprocessor, cat_cols)
])
XGBmodel = XGBRegressor(random_state=0)
pipe = Pipeline(steps=[
('preprocess', preprocessor),
('XGBmodel', XGBmodel)
])
param_grid = {
"XGBmodel__n_estimators": [10, 50, 100, 500],
"XGBmodel__learning_rate": [0.1, 0.5, 1],
}
fit_params = {"XGBmodel__eval_set": [(X_valid, y_valid)],
"XGBmodel__early_stopping_rounds": 10,
"XGBmodel__verbose": False}
searchCV = GridSearchCV(pipe, cv=5, param_grid=param_grid, fit_params=fit_params)
searchCV.fit(X_train, y_train)
Run Code Online (Sandbox Code Playgroud)
有什么方法可以预处理管道中的验证数据?或者也许完全不同的方式来实现这个东西?
归档时间: |
|
查看次数: |
1227 次 |
最近记录: |