我不能直接在有问题的页面上发表评论,但基本上我是想从dplyr :: do()与dplyr :: mutate结合使用代码 ? 上班.
dat <- mtcars
dat %>%
group_by(gear) %>%
mutate(res = residuals(lm(deparse(substitute(mpg ~ disp)))))
Run Code Online (Sandbox Code Playgroud)
运行上面的代码,我得到:
Run Code Online (Sandbox Code Playgroud)"Error in eval(substitute(expr), envir, enclos) : object 'mpg' not found"
我错过了什么吗?
这里有很多选择,包括modelr::add_residuals(参见@ LmW的答案)broom::augment,以及简单的旧选项residuals.如果您正在使用分组模型,则在列表列中嵌套模型非常方便,并且自然会导致迭代模型列表以计算残差等.
residuals普通的旧基地R与一些人整齐地工作purrr(lapply如果你愿意,可以使用):
library(tidyverse)
mtcars %>%
rownames_to_column('car') %>%
nest(-gear) %>%
mutate(model = map(data, ~lm(mpg ~ disp, data = .x)),
resid = map(model, residuals)) %>%
unnest(data, resid)
#> # A tibble: 32 × 13
#> gear resid car mpg cyl disp hp drat wt
#> <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 4 0.98649891 Mazda RX4 21.0 6 160.0 110 3.90 2.620
#> 2 4 0.98649891 Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875
#> 3 4 -3.56856040 Datsun 710 22.8 4 108.0 93 3.85 2.320
#> 4 4 2.76107028 Merc 240D 24.4 4 146.7 62 3.69 3.190
#> 5 4 0.44001547 Merc 230 22.8 4 140.8 95 3.92 3.150
#> 6 4 0.11531527 Merc 280 19.2 6 167.6 123 3.92 3.440
#> 7 4 -1.28468473 Merc 280C 17.8 6 167.6 123 3.92 3.440
#> 8 4 2.45060811 Fiat 128 32.4 4 78.7 66 4.08 2.200
#> 9 4 0.08397007 Honda Civic 30.4 4 75.7 52 4.93 1.615
#> 10 4 3.02179175 Toyota Corolla 33.9 4 71.1 65 4.22 1.835
#> # ... with 22 more rows, and 4 more variables: qsec <dbl>, vs <dbl>,
#> # am <dbl>, carb <dbl>
Run Code Online (Sandbox Code Playgroud)
您可以lm直接将呼叫包裹在residuals:
mtcars %>%
rownames_to_column('car') %>%
group_by(gear) %>%
mutate(resid = residuals(lm(mpg ~ disp)))
Run Code Online (Sandbox Code Playgroud)
获得相同的结果,但这种方法是不可取的,除非你确定你不打算对模型做任何其他事情.(显然不需要丢失模型,但是你可以控制是否以及何时这样做以及是否通过更早地破坏链来保存副本.)
broom::augmentaugment 添加了许多有用的变量,包括残差,可以类似地使用:
mtcars %>%
rownames_to_column('car') %>%
nest(-gear) %>%
mutate(model = map(data, ~lm(mpg ~ disp, data = .x)),
model_data = map(model, broom::augment)) %>%
unnest(model_data)
#> # A tibble: 32 × 10
#> gear mpg disp .fitted .se.fit .resid .hat .sigma
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 4 21.0 160.0 20.01350 0.9758770 0.98649891 0.16546553 2.503083
#> 2 4 21.0 160.0 20.01350 0.9758770 0.98649891 0.16546553 2.503083
#> 3 4 22.8 108.0 26.36856 0.7466989 -3.56856040 0.09687426 2.197330
#> 4 4 24.4 146.7 21.63893 0.8206560 2.76107028 0.11701449 2.331455
#> 5 4 22.8 140.8 22.35998 0.7674126 0.44001547 0.10232345 2.524090
#> 6 4 19.2 167.6 19.08468 1.0800836 0.11531527 0.20268993 2.528466
#> 7 4 17.8 167.6 19.08468 1.0800836 -1.28468473 0.20268993 2.482941
#> 8 4 32.4 78.7 29.94939 1.0762841 2.45060811 0.20126638 2.357875
#> 9 4 30.4 75.7 30.31603 1.1195513 0.08397007 0.21777368 2.528634
#> 10 4 33.9 71.1 30.87821 1.1879209 3.02179175 0.24518417 2.247410
#> # ... with 22 more rows, and 2 more variables: .cooksd <dbl>,
#> # .std.resid <dbl>
Run Code Online (Sandbox Code Playgroud)
如果你想从原始数据保存未使用的变量,更改model_data到model_data = map2(model, data, broom::augment)),传递augment一个data参数,而不是让它默认模型所使用的数据.
modelr::add_residuals()应该做你想做的事:
require(tidyverse)\nrequire(modelr)\n\nmodels <- mtcars %>% \n group_by(gear) %>% \n nest() %>%\n mutate(model = map(data, ~lm(mpg ~ disp, data = .)),\n residuals = map2(data, model, add_residuals))\n\nmodels %>% unnest(residuals)\n\n# A tibble: 32 \xc3\x97 12\n gear mpg cyl disp hp drat wt qsec vs am carb\n <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>\n1 4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4\n2 4 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4\n3 4 22.8 4 108.0 93 3.85 2.320 18.61 1 1 1\n4 4 24.4 4 146.7 62 3.69 3.190 20.00 1 0 2\n5 4 22.8 4 140.8 95 3.92 3.150 22.90 1 0 2\n6 4 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4\n7 4 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4\n8 4 32.4 4 78.7 66 4.08 2.200 19.47 1 1 1\n9 4 30.4 4 75.7 52 4.93 1.615 18.52 1 1 2\n10 4 33.9 4 71.1 65 4.22 1.835 19.90 1 1 1\n# ... with 22 more rows, and 1 more variables: resid <dbl>\nRun Code Online (Sandbox Code Playgroud)\n\n查看文档modelr;我觉得它非常方便。