如何保存基于防风草/agua 的 H2O 对象并再次检索它

sca*_*der 4 r h2o tidymodels parsnip

我有以下使用 tidymodels\' agua包的脚本:

\n
library(tidymodels)\nlibrary(agua)\nlibrary(ggplot2)\ntheme_set(theme_bw())\nh2o_start()\n\ndata(concrete)\nset.seed(4595)\nconcrete_split <- initial_split(concrete, strata = compressive_strength)\nconcrete_train <- training(concrete_split)\nconcrete_test <- testing(concrete_split)\n\n# run for a maximum of 120 seconds\nauto_spec <-\n  auto_ml() %>%\n  set_engine("h2o", max_runtime_secs = 120, seed = 1) %>%\n  set_mode("regression")\n\nnormalized_rec <-\n  recipe(compressive_strength ~ ., data = concrete_train) %>%\n  step_normalize(all_predictors())\n\nauto_wflow <-\n  workflow() %>%\n  add_model(auto_spec) %>%\n  add_recipe(normalized_rec)\n\nauto_fit <- fit(auto_wflow, data = concrete_train)\nsaveRDS(auto_fit, file = "test.h2o.auto_fit.rds") #save the object\nh2o_end()\n
Run Code Online (Sandbox Code Playgroud)\n

在那里,我尝试将auto_fit对象保存到文件中。\n但是当我尝试检索它并使用它来预测测试数据时:

\n
h2o_start()\nauto_fit <- readRDS("test.h2o.auto_fit.rds")\npredict(auto_fit, new_data = concrete_test)\n
Run Code Online (Sandbox Code Playgroud)\n

我收到一个错误:

\n
Error in `h2o_get_model()`:\n! Model id does not exist on the h2o server.\n
Run Code Online (Sandbox Code Playgroud)\n

有什么方法可以解决呢?

\n

预期结果是:

\n
predict(auto_fit, new_data = concrete_test)\n#> # A tibble: 260 \xc3\x97 1\n#>    .pred\n#>    <dbl>\n#>  1  40.0\n#>  2  43.0\n#>  3  38.2\n#>  4  55.7\n#>  5  41.4\n#>  6  28.1\n#>  7  53.2\n#>  8  34.5\n#>  9  51.1\n#> 10  37.9\n#> # \xe2\x80\xa6 with 250 more rows\n
Run Code Online (Sandbox Code Playgroud)\n
\n

更新

\n

遵循西蒙·库奇的建议后

\n
auto_fit <- fit(auto_wflow, data = concrete_train)\nauto_fit_bundle <- bundle(auto_fit)\nsaveRDS(auto_fit_bundle, file = "test.h2o.auto_fit.rds") #save the object\nh2o_end()\n\n# to reload\nh2o_start()\nauto_fit_bundle <- readRDS("test.h2o.auto_fit.rds")\nauto_fit <- unbundle(auto_fit_bundle)\npredict(auto_fit, new_data = concrete_test)\n\nrank_results(auto_fit)\n
Run Code Online (Sandbox Code Playgroud)\n

我收到此错误消息:

\n
Error in UseMethod("rank_results") : \n  no applicable method for \'rank_results\' applied to an object of class "c(\'H2ORegressionModel\', \'H2OModel\', \'Keyed\')"\n
Run Code Online (Sandbox Code Playgroud)\n

小智 7

R 中的某些模型对象需要从 file\xe2\x80\x94h2o 对象(因此包装它们的 tidymodels 对象)保存和重新加载本机序列化方法就是这样做的一个示例。

\n

Posit 的 tidymodels 和 vetiver 团队最近合作开发了一个包,bundle,它为本机序列化方法提供了一致的接口。关于 h2o 的文档在这里

\n
library(bundle)\n
Run Code Online (Sandbox Code Playgroud)\n

简而言之,您需要将准备bundle()保存的对象与通常的 一起保存saveRDS(),然后在新会话中loadRDS()保存unbundle()加载的对象。的输出unbundle()是您准备好的模型对象。:)

\n
# to save:\nauto_fit <- fit(auto_wflow, data = concrete_train)\nauto_fit_bundle <- bundle(auto_fit)\nsaveRDS(auto_fit_bundle, file = "test.h2o.auto_fit.rds") #save the object\nh2o_end()\n
Run Code Online (Sandbox Code Playgroud)\n
# to reload\nh2o_start()\nauto_fit_bundle <- readRDS("test.h2o.auto_fit.rds")\nauto_fit <- unbundle(auto_fit_bundle)\npredict(auto_fit, new_data = concrete_test)\n
Run Code Online (Sandbox Code Playgroud)\n