Mih*_*iha 1 r r-recipes tidymodels
转换回由配方转换的列outcome(在本例中为)的最优雅的方法是什么?mpg该解决方案可以是通用的(如果存在或仅适用于log和normalize步骤(如下编码)。
可能有用的链接:此处
讨论了一般解决方案,但我认为它尚未实施。这里提供了
R 函数的解决方案,但我不确定在这种情况下是否可以提供帮助。scale
library(recipes)
data <- tibble(mtcars) %>%
select(cyl, mpg)
rec <- recipe(mpg ~ ., data = data) %>%
step_log(all_numeric()) %>%
step_normalize(all_numeric()) %>%
prep()
data_baked <- bake(rec, new_data = data)
# model fitting, predictions, etc...
# how to invert/transform back predictions (estimates) and true outcomes
Run Code Online (Sandbox Code Playgroud)
从配方转换中获取所需值的方法是获取tidy()配方,然后使用 dplyr 动词来获取所需内容。
library(recipes)
#> Loading required package: dplyr
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
#>
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stats':
#>
#> step
data <- tibble(mtcars) %>%
select(cyl, mpg)
rec <- recipe(mpg ~ ., data = data) %>%
step_log(all_numeric()) %>%
step_normalize(all_numeric(), id = "normalize_num") %>%
prep()
Run Code Online (Sandbox Code Playgroud)
有两种方法可以摆脱配方步骤,然后您可以tidy()使用参数:
## notice that you can identify steps by `number` or `id`
tidy(rec)
#> # A tibble: 2 x 6
#> number operation type trained skip id
#> <int> <chr> <chr> <lgl> <lgl> <chr>
#> 1 1 step log TRUE FALSE log_LYuaY
#> 2 2 step normalize TRUE FALSE normalize_num
## choose by number
tidy(rec, number = 1)
#> # A tibble: 2 x 3
#> terms base id
#> <chr> <dbl> <chr>
#> 1 cyl 2.72 log_LYuaY
#> 2 mpg 2.72 log_LYuaY
## choose by id, which we set above (otherwise it has random id like log)
tidy(rec, id = "normalize_num")
#> # A tibble: 4 x 4
#> terms statistic value id
#> <chr> <chr> <dbl> <chr>
#> 1 cyl mean 1.78 normalize_num
#> 2 mpg mean 2.96 normalize_num
#> 3 cyl sd 0.309 normalize_num
#> 4 mpg sd 0.298 normalize_num
Run Code Online (Sandbox Code Playgroud)
一旦我们知道我们想要哪一步,我们就可以使用 dplyr 动词来准确地得到我们想要转换回来的值,比如 的平均值mpg。
## extract out value
tidy(rec, id = "normalize_num") %>%
filter(terms == "mpg", statistic == "mean") %>%
pull(value)
#> mpg
#> 2.957514
Run Code Online (Sandbox Code Playgroud)
由reprex 包于 2021 年 1 月 25 日创建(v0.3.0)