反转转换配方步骤的优雅方法(规范化和记录)?

Mih*_*iha 1 r r-recipes tidymodels

转换回由配方转换的列outcome(在本例中为)的最优雅的方法是什么?mpg该解决方案可以是通用的(如果存在或仅适用于lognormalize步骤(如下编码)。

可能有用的链接:此处
讨论了一般解决方案,但我认为它尚未实施。这里提供了 R 函数的解决方案,但我不确定在这种情况下是否可以提供帮助。
scale

library(recipes)

data <- tibble(mtcars) %>% 
    select(cyl, mpg)

rec <- recipe(mpg ~ ., data = data) %>%
    step_log(all_numeric()) %>%
    step_normalize(all_numeric()) %>%
    prep()

data_baked <- bake(rec, new_data = data)

# model fitting, predictions, etc...

# how to invert/transform back predictions (estimates) and true outcomes

Run Code Online (Sandbox Code Playgroud)

Jul*_*lge 5

从配方转换中获取所需值的方法是获取tidy()配方,然后使用 dplyr 动词来获取所需内容。

library(recipes)
#> Loading required package: dplyr
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
#> 
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stats':
#> 
#>     step

data <- tibble(mtcars) %>% 
  select(cyl, mpg)

rec <- recipe(mpg ~ ., data = data) %>%
  step_log(all_numeric()) %>%
  step_normalize(all_numeric(), id = "normalize_num") %>%
  prep()
Run Code Online (Sandbox Code Playgroud)

两种方法可以摆脱配方步骤,然后您可以tidy()使用参数:

## notice that you can identify steps by `number` or `id`
tidy(rec)
#> # A tibble: 2 x 6
#>   number operation type      trained skip  id           
#>    <int> <chr>     <chr>     <lgl>   <lgl> <chr>        
#> 1      1 step      log       TRUE    FALSE log_LYuaY    
#> 2      2 step      normalize TRUE    FALSE normalize_num

## choose by number
tidy(rec, number = 1)
#> # A tibble: 2 x 3
#>   terms  base id       
#>   <chr> <dbl> <chr>    
#> 1 cyl    2.72 log_LYuaY
#> 2 mpg    2.72 log_LYuaY
## choose by id, which we set above (otherwise it has random id like log)
tidy(rec, id = "normalize_num")
#> # A tibble: 4 x 4
#>   terms statistic value id           
#>   <chr> <chr>     <dbl> <chr>        
#> 1 cyl   mean      1.78  normalize_num
#> 2 mpg   mean      2.96  normalize_num
#> 3 cyl   sd        0.309 normalize_num
#> 4 mpg   sd        0.298 normalize_num
Run Code Online (Sandbox Code Playgroud)

一旦我们知道我们想要哪一步,我们就可以使用 dplyr 动词来准确地得到我们想要转换回来的值,比如 的平均值mpg

## extract out value
tidy(rec, id = "normalize_num") %>%
  filter(terms == "mpg", statistic == "mean") %>%
  pull(value)
#>      mpg 
#> 2.957514
Run Code Online (Sandbox Code Playgroud)

由reprex 包于 2021 年 1 月 25 日创建(v0.3.0)