我尝试使用 tidymodels 通过配方和模型参数来调整工作流程。调整单个工作流程时没有问题。但是,当调整具有多个工作流程的工作流程集时,它总是会失败。这是我的代码:
\n# read the training data\ntrain <- read_csv("../../train.csv")\ntrain <- train %>% \n mutate(\n id = row_number(),\n across(where(is.double), as.integer),\n across(where(is.character), as.factor),\n r_yn = fct_relevel(r_yn, "yes")) %>% \n select(id, r_yn, everything())\n\n# setting the recipes\n\n# no precess\nrec_no <- recipe(r_yn ~ ., data = train) %>%\n update_role(id, new_role = "ID")\n\n# downsample: tuning the under_ratio\nrec_ds_tune <- rec_no %>% \n step_downsample(r_yn, under_ratio = tune(), skip = TRUE, seed = 100) %>%\n step_nzv(all_predictors(), freq_cut = 100)\n\n# setting the models\n\n# randomforest\nspec_rf_tune <- rand_forest(trees = 100, …Run Code Online (Sandbox Code Playgroud) 最近我学习使用 tidymodels 来构建机器学习工作流程,但是当我使用该工作流程对测试集进行预测时,它会引发错误“列中缺少数据”,但我确信训练集和测试集都没有有缺失数据。这是我的代码和示例:
\n# Imformation of the data\xef\xbc\x9athe Primary_type in test set has several novel levels\nstr(train_sample)\ntibble [500,000 x 3] (S3: tbl_df/tbl/data.frame)\n $ ID : num [1:500000] 6590508 2902772 6162081 7777470 7134849 ...\n $ Primary_type: Factor w/ 29 levels "ARSON","ASSAULT",..: 16 8 3 3 28 7 3 4 25 15 ...\n $ Arrest : Factor w/ 2 levels "FALSE","TRUE": 2 1 1 1 1 2 1 1 1 1 ...\n\nstr(test_sample)\ntibble [300,000 x 3] (S3: tbl_df/tbl/data.frame)\n $ ID : num [1:300000] …Run Code Online (Sandbox Code Playgroud)