在 R 中迭代多个回归模型和数据子集

use*_*932 5 regression r broom purrr tidyverse

我正在尝试学习如何使用 R 中的 purrr 和 broom 包在数据集的子集上自动运行 3 个或更多回归模型。我正在使用 nest %>% mutate(map()) %>% unnest()心流。

当只有一个回归模型应用于多个数据子集时,我能够在线复制示例。但是,当我的函数中有多个回归模型时,我会遇到问题。

我试图做的

library(tidyverse)
library(broom)

estimate_model <- function(df) {
  model1 <- lm(mpg ~ wt, data = df)
  model2 <- lm(mpg ~ wt + gear, data = df)
  model3 <- lm(mpg ~ wt + gear + vs, data = df)
}

ols_1dep_3specs <- mtcars %>%
    nest(-cyl) %>%
    mutate(
       estimates = map(data, estimate_model), # want to run several models at once
       coef_wt = map(estimate,  ~pluck(coef(.), "wt")), # coefficient of wt only
       se_wt = map(estimate, ~pluck(tidy(.), "std.error")[[2]]), # se of wt only
       rsq = map(model, ~pluck(glance(.), "r.squared")),
       arsq = map(model, ~pluck(glance(.), "adj.r.squared"))
    ) %>%
    unnest(coef_wt, se_wt, rsq, arsq)

ols_1dep_3specs

Run Code Online (Sandbox Code Playgroud)

不幸的是,这似乎只在函数estimate_model只包含一个回归模型时才有效。当有多个规范时,关于如何编写代码的任何建议?向 nest() %>% mutate(map()) %>% nest() 框架之外的建议开放。


以下代码达到了我希望实现的目标,但涉及大量重复。

estimate_model1 <- function(df) {
  model1 <- lm(mpg ~ wt, data = df)
}
estimate_model2 <- function(df) {
  model2 <- lm(mpg ~ wt + gear, data = df)
}
estimate_model3 <- function(df) {
  model3 <- lm(mpg ~ wt + gear + vs, data = df)
}

ols_1dep_3specs <- mtcars %>%
  nest(-cyl) %>%
  mutate(model_1 = map(data, estimate_model1),
         model_2 = map(data, estimate_model2),
         model_3 = map(data, estimate_model3)) %>%
  mutate(coef_wt_1 = map_dbl(model_1, ~pluck(coef(.), "wt")),
         coef_wt_2 = map_dbl(model_2, ~pluck(coef(.), "wt")),
         coef_wt_3 = map_dbl(model_3, ~pluck(coef(.), "wt")),
         rsq_1 = map_dbl(model_1, ~pluck(glance(.), "r.squared")),
         rsq_2 = map_dbl(model_2, ~pluck(glance(.), "r.squared")),
         rsq_3 = map_dbl(model_3, ~pluck(glance(.), "r.squared"))) %>% 
  dplyr::select(starts_with("coef_wt"), starts_with("rsq")) 

Run Code Online (Sandbox Code Playgroud)

akr*_*run 3

在函数中,没有返回调用,最好将所有模型放在一个list

\n\n
estimate_model <- function(df) {\n        model1 <- lm(mpg ~ wt, data = df)\n        model2 <- lm(mpg ~ wt + gear, data = df)\n        model3 <- lm(mpg ~ wt + gear + vs, data = df)\n        list(model1, model2, model3)\n      }\n
Run Code Online (Sandbox Code Playgroud)\n\n

list然后通过循环每个元素来应用第一段代码

\n\n
mtcars %>% \n  group_by(cyl) %>%\n  nest() %>% \n  mutate(estimates = map(data, estimate_model),\n         coef_wt = map(estimates,  ~map_dbl(.x, ~ pluck(coef(.x), "wt"))),\n         se_wt = map(estimates, ~map_dbl(.x, ~pluck(tidy(.x), "std.error")[[2]])), \n         rsq = map(estimates, ~ map_dbl(.x, ~pluck(glance(.x), "r.squared"))),\n          arsq = map(estimates, ~map_dbl(.x, ~ pluck(glance(.x), "adj.r.squared")))) %>%\n  unnest(c(coef_wt, se_wt, rsq, arsq))\n# A tibble: 9 x 7\n# Groups:   cyl [3]\n#    cyl            data estimates  coef_wt se_wt   rsq  arsq\n#  <dbl> <list<df[,10]>> <list>       <dbl> <dbl> <dbl> <dbl>\n#1     6        [7 \xc3\x97 10] <list [3]>   -2.78 1.33  0.465 0.357\n#2     6        [7 \xc3\x97 10] <list [3]>   -3.92 1.41  0.660 0.489\n#3     6        [7 \xc3\x97 10] <list [3]>   -6.19 4.49  0.690 0.379\n#4     4       [11 \xc3\x97 10] <list [3]>   -5.65 1.85  0.509 0.454\n#5     4       [11 \xc3\x97 10] <list [3]>   -5.38 2.08  0.517 0.396\n#6     4       [11 \xc3\x97 10] <list [3]>   -5.13 2.16  0.555 0.364\n#7     8       [14 \xc3\x97 10] <list [3]>   -2.19 0.739 0.423 0.375\n#8     8       [14 \xc3\x97 10] <list [3]>   -2.43 0.798 0.459 0.361\n#9     8       [14 \xc3\x97 10] <list [3]>   -2.43 0.798 0.459 0.361\n
Run Code Online (Sandbox Code Playgroud)\n