将回归的残差保存到原始数据帧

Ton*_*016 4 r dplyr

我不能直接在有问题的页面上发表评论,但基本上我是想从dplyr :: do()与dplyr :: mutate结合使用代码 上班.

dat <- mtcars

dat %>% 
    group_by(gear) %>% 
    mutate(res = residuals(lm(deparse(substitute(mpg ~ disp)))))
Run Code Online (Sandbox Code Playgroud)

运行上面的代码,我得到:

"Error in eval(substitute(expr), envir, enclos) : object 'mpg' not found"
Run Code Online (Sandbox Code Playgroud)

我错过了什么吗?

ali*_*ire 7

这里有很多选择,包括modelr::add_residuals(参见@ LmW的答案)broom::augment,以及简单的旧选项residuals.如果您正在使用分组模型,则在列表列中嵌套模型非常方便,并且自然会导致迭代模型列表以计算残差等.


residuals

普通的旧基地R与一些人整齐地工作purrr(lapply如果你愿意,可以使用):

library(tidyverse)

mtcars %>% 
    rownames_to_column('car') %>% 
    nest(-gear) %>% 
    mutate(model = map(data, ~lm(mpg ~ disp, data = .x)),
           resid = map(model, residuals)) %>%
    unnest(data, resid)

#> # A tibble: 32 × 13
#>     gear       resid            car   mpg   cyl  disp    hp  drat    wt
#>    <dbl>       <dbl>          <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1      4  0.98649891      Mazda RX4  21.0     6 160.0   110  3.90 2.620
#> 2      4  0.98649891  Mazda RX4 Wag  21.0     6 160.0   110  3.90 2.875
#> 3      4 -3.56856040     Datsun 710  22.8     4 108.0    93  3.85 2.320
#> 4      4  2.76107028      Merc 240D  24.4     4 146.7    62  3.69 3.190
#> 5      4  0.44001547       Merc 230  22.8     4 140.8    95  3.92 3.150
#> 6      4  0.11531527       Merc 280  19.2     6 167.6   123  3.92 3.440
#> 7      4 -1.28468473      Merc 280C  17.8     6 167.6   123  3.92 3.440
#> 8      4  2.45060811       Fiat 128  32.4     4  78.7    66  4.08 2.200
#> 9      4  0.08397007    Honda Civic  30.4     4  75.7    52  4.93 1.615
#> 10     4  3.02179175 Toyota Corolla  33.9     4  71.1    65  4.22 1.835
#> # ... with 22 more rows, and 4 more variables: qsec <dbl>, vs <dbl>,
#> #   am <dbl>, carb <dbl>
Run Code Online (Sandbox Code Playgroud)

您可以lm直接将呼叫包裹在residuals:

mtcars %>% 
    rownames_to_column('car') %>% 
    group_by(gear) %>% 
    mutate(resid = residuals(lm(mpg ~ disp)))
Run Code Online (Sandbox Code Playgroud)

获得相同的结果,但这种方法是不可取的,除非你确定你不打算对模型做任何其他事情.(显然不需要丢失模型,但是你可以控制是否以及何时这样做以及是否通过更早地破坏链来保存副本.)


broom::augment

augment 添加了许多有用的变量,包括残差,可以类似地使用:

mtcars %>% 
    rownames_to_column('car') %>%
    nest(-gear) %>% 
    mutate(model = map(data, ~lm(mpg ~ disp, data = .x)), 
           model_data = map(model, broom::augment)) %>% 
    unnest(model_data)

#> # A tibble: 32 × 10
#>     gear   mpg  disp  .fitted   .se.fit      .resid       .hat   .sigma
#>    <dbl> <dbl> <dbl>    <dbl>     <dbl>       <dbl>      <dbl>    <dbl>
#> 1      4  21.0 160.0 20.01350 0.9758770  0.98649891 0.16546553 2.503083
#> 2      4  21.0 160.0 20.01350 0.9758770  0.98649891 0.16546553 2.503083
#> 3      4  22.8 108.0 26.36856 0.7466989 -3.56856040 0.09687426 2.197330
#> 4      4  24.4 146.7 21.63893 0.8206560  2.76107028 0.11701449 2.331455
#> 5      4  22.8 140.8 22.35998 0.7674126  0.44001547 0.10232345 2.524090
#> 6      4  19.2 167.6 19.08468 1.0800836  0.11531527 0.20268993 2.528466
#> 7      4  17.8 167.6 19.08468 1.0800836 -1.28468473 0.20268993 2.482941
#> 8      4  32.4  78.7 29.94939 1.0762841  2.45060811 0.20126638 2.357875
#> 9      4  30.4  75.7 30.31603 1.1195513  0.08397007 0.21777368 2.528634
#> 10     4  33.9  71.1 30.87821 1.1879209  3.02179175 0.24518417 2.247410
#> # ... with 22 more rows, and 2 more variables: .cooksd <dbl>,
#> #   .std.resid <dbl>
Run Code Online (Sandbox Code Playgroud)

如果你想从原始数据保存未使用的变量,更改model_datamodel_data = map2(model, data, broom::augment)),传递augment一个data参数,而不是让它默认模型所使用的数据.


LmW*_*mW. 5

modelr::add_residuals()应该做你想做的事:

\n\n
require(tidyverse)\nrequire(modelr)\n\nmodels <- mtcars %>% \n    group_by(gear) %>% \n    nest() %>%\n    mutate(model = map(data, ~lm(mpg ~ disp, data = .)),\n           residuals = map2(data, model, add_residuals))\n\nmodels %>% unnest(residuals)\n\n# A tibble: 32 \xc3\x97 12\n    gear   mpg   cyl  disp    hp  drat    wt  qsec    vs    am  carb\n   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>\n1      4  21.0     6 160.0   110  3.90 2.620 16.46     0     1     4\n2      4  21.0     6 160.0   110  3.90 2.875 17.02     0     1     4\n3      4  22.8     4 108.0    93  3.85 2.320 18.61     1     1     1\n4      4  24.4     4 146.7    62  3.69 3.190 20.00     1     0     2\n5      4  22.8     4 140.8    95  3.92 3.150 22.90     1     0     2\n6      4  19.2     6 167.6   123  3.92 3.440 18.30     1     0     4\n7      4  17.8     6 167.6   123  3.92 3.440 18.90     1     0     4\n8      4  32.4     4  78.7    66  4.08 2.200 19.47     1     1     1\n9      4  30.4     4  75.7    52  4.93 1.615 18.52     1     1     2\n10     4  33.9     4  71.1    65  4.22 1.835 19.90     1     1     1\n# ... with 22 more rows, and 1 more variables: resid <dbl>\n
Run Code Online (Sandbox Code Playgroud)\n\n

查看文档modelr;我觉得它非常方便。

\n