Muh*_*mil 1 r dplyr purrr tidyverse
n <- 3
strata <- rep(1:4, each=n)
y <- rnorm(n =12)
x <- 1:12
category <- rep(c("A", "B", "C"), times = 4)
df <- cbind.data.frame(y, x, strata, category)
Run Code Online (Sandbox Code Playgroud)
我想首先按“层”将我的数据拆分为一个列表,然后我想再次按“类别”拆分新列表中的所有数据框。最后,我想在每个结果数据框内对 x 上的 y 进行回归(在这种情况下,每个数据框将是一行,但在实际数据中,每个层的长度不同,层内的类别数也不同)。
R 中的规范方法是使用split:
L <- split(df, df[,c("strata","category")])
L
# $`1.A`
# y x strata category
# 1 -1.120867 1 1 A
# $`2.A`
# y x strata category
# 4 -1.023001 4 2 A
# $`3.A`
# y x strata category
# 7 0.5411806 7 3 A
# $`4.A`
# y x strata category
# 10 1.546789 10 4 A
# $`1.B`
# y x strata category
# 2 0.6730641 2 1 B
# $`2.B`
# y x strata category
# 5 -1.466816 5 2 B
# $`3.B`
# y x strata category
# 8 -0.1955617 8 3 B
# $`4.B`
# y x strata category
# 11 -0.660904 11 4 B
# $`1.C`
# y x strata category
# 3 -0.9880206 3 1 C
# $`2.C`
# y x strata category
# 6 0.4111802 6 2 C
# $`3.C`
# y x strata category
# 9 -0.03311637 9 3 C
# $`4.C`
# y x strata category
# 12 0.6799109 12 4 C
Run Code Online (Sandbox Code Playgroud)
12 元素列表(此处)的名称是两个分类变量的字符串连接,.-delimited;这很容易被覆盖(手动)。
从这里开始,要对每个元素进行回归,您可能会执行以下操作:
models <- lapply(L, function(x) lm(..., data=x))
Run Code Online (Sandbox Code Playgroud)
(或您计划使用的任何回归工具)。
如果你愿意,你可以一步完成
results <- by(df, df[,c("strata","category")], function(x) lm(..., data=x))
Run Code Online (Sandbox Code Playgroud)
好处是它可以一步完成。该by回报可以看有点古怪,但它实际上只是list一些特殊的print.by使用方法; 您仍然可以根据需要像列表一样引用它。
另一种方法是dplyr:
library(dplyr)
results <- df %>%
group_by(strata, category) %>%
summarize(model = list(lm(y ~ x)))
results
# # A tibble: 12 x 3
# # Groups: strata [4]
# strata category model
# <int> <chr> <list>
# 1 1 A <lm>
# 2 1 B <lm>
# 3 1 C <lm>
# 4 2 A <lm>
# 5 2 B <lm>
# 6 2 C <lm>
# 7 3 A <lm>
# 8 3 B <lm>
# 9 3 C <lm>
# 10 4 A <lm>
# 11 4 B <lm>
# 12 4 C <lm>
results$model[[1]]
# Call:
# lm(formula = y ~ x)
# Coefficients:
# (Intercept) x
# -1.121 NA
Run Code Online (Sandbox Code Playgroud)
正如 Onyambu 所指出的(谢谢!),这很有效(没有data=),因为我们明确列出了变量,它们会被找到。.例如,如果您的回归使用,您可能希望将其正式化
results <- df %>%
group_by(strata, category) %>%
summarize(model = list(lm(y ~ ., data = cur_data())))
Run Code Online (Sandbox Code Playgroud)
y~x没有它会工作,但y~.不会,因此data=cur_data()。
我们稍微修改了输入,使分组列成为因子,并在最后的注释中为每组提供更多数据。
1) lmList然后我们使用lmListfrom nlme ,它可以按组进行回归。我们已经使用过,pool=FALSE但您可以省略它并pool=TRUE根据您的需要使用默认值。详情请参阅?lmList。
library(nlme)
fm <- lmList(y ~ x | g, transform(df, g = strata:category), pool = FALSE)
summary(fm)
Run Code Online (Sandbox Code Playgroud)
给予:
Call:
Model: y ~ x | g
Data: print(transform(df, g = strata:category))
Coefficients:
(Intercept)
Estimate Std. Error t value Pr(>|t|)
1:A -0.6508622 0.7185126 -0.9058467 0.53142497
1:B -1.8043171 1.5367930 -1.1740794 0.44913412
2:A 4.0963651 1.4952687 2.7395512 0.22281400
2:B 3.5787230 0.1888514 18.9499422 0.03356368
x
Estimate Std. Error t value Pr(>|t|)
1:A -0.03320257 0.21035895 -0.1578377 0.9003396
1:B 0.33526295 0.35569847 0.9425482 0.5188229
2:A -0.45361115 0.16347186 -2.7748577 0.2202010
2:B -0.35620163 0.01863826 -19.1113091 0.0332808
Run Code Online (Sandbox Code Playgroud)
2) lm 我们可以交替地制定lm具有单独截距和斜率的单个模型。这会在每个组内创建单独的截距和斜率模型矩阵列。看看model.matrix(y ~ (strata:category)/(x + 1) - 1, df)以获得洞察力。
fm2 <- lm(y ~ (strata:category)/(x + 1) - 1, df)
summary(fm2)
Run Code Online (Sandbox Code Playgroud)
给予:
Call:
lm(formula = y ~ (strata:category)/(x + 1) - 1, data = df)
Residuals:
1 2 3 4 5 6 7 8
0.24290 0.41073 -0.48580 -0.82145 0.24290 0.41073 0.18876 -0.02152
9 10 11 12
-0.37752 0.04304 0.18876 -0.02152
Coefficients:
Estimate Std. Error t value Pr(>|t|)
strata1:categoryA -0.6509 0.7596 -0.857 0.440
strata2:categoryA 4.0964 2.0343 2.014 0.114
strata1:categoryB -1.8043 0.9609 -1.878 0.134
strata2:categoryB 3.5787 2.2534 1.588 0.187
strata1:categoryA:x -0.0332 0.2224 -0.149 0.889
strata2:categoryA:x -0.4536 0.2224 -2.040 0.111
strata1:categoryB:x 0.3353 0.2224 1.507 0.206
strata2:categoryB:x -0.3562 0.2224 -1.602 0.184
Residual standard error: 0.629 on 4 degrees of freedom
Multiple R-squared: 0.7886, Adjusted R-squared: 0.3658
F-statistic: 1.865 on 8 and 4 DF, p-value: 0.2862
Run Code Online (Sandbox Code Playgroud)
3) dplyr/扫帚
我们可以像这样生成一个列表或tibble。
3a) group_map
library(broom)
library(dplyr)
df %>%
group_by(strata, category) %>%
group_map(~ tidy(lm(y ~ x, .)))
Run Code Online (Sandbox Code Playgroud)
给予:
[[1]]
# A tibble: 2 x 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) -0.651 0.719 -0.906 0.531
2 x -0.0332 0.210 -0.158 0.900
[[2]]
# A tibble: 2 x 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) -1.80 1.54 -1.17 0.449
2 x 0.335 0.356 0.943 0.519
[[3]]
# A tibble: 2 x 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 4.10 1.50 2.74 0.223
2 x -0.454 0.163 -2.77 0.220
[[4]]
# A tibble: 2 x 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 3.58 0.189 18.9 0.0336
2 x -0.356 0.0186 -19.1 0.0333
Run Code Online (Sandbox Code Playgroud)
3b) group_modify
library(broom)
library(dplyr)
df %>%
group_by(strata, category) %>%
group_modify(~ tidy(lm(y ~ x, .))) %>%
ungroup
Run Code Online (Sandbox Code Playgroud)
给予:
# A tibble: 8 x 7
strata category term estimate std.error statistic p.value
<fct> <fct> <chr> <dbl> <dbl> <dbl> <dbl>
1 1 A (Intercept) -0.651 0.719 -0.906 0.531
2 1 A x -0.0332 0.210 -0.158 0.900
3 1 B (Intercept) -1.80 1.54 -1.17 0.449
4 1 B x 0.335 0.356 0.943 0.519
5 2 A (Intercept) 4.10 1.50 2.74 0.223
6 2 A x -0.454 0.163 -2.77 0.220
7 2 B (Intercept) 3.58 0.189 18.9 0.0336
8 2 B x -0.356 0.0186 -19.1 0.0333
Run Code Online (Sandbox Code Playgroud)
4) listcompr另一种方法是通过 listcompr 包使用列表推导式。第一个参数是组件名称的模板,第二个参数是表达式——在本例中是lm调用,其余参数定义索引s和c。
library(listcompr)
with(df, gen.named.list(str = "{s}.{c}",
expr = lm(y ~ x, subset = strata == s & category == c),
s = levels(strata), c = levels(category)))
Run Code Online (Sandbox Code Playgroud)
给出这个命名的lm对象列表:
$`1.A`
Call:
lm(formula = y ~ x, subset = strata == s & category == c)
Coefficients:
(Intercept) x
-0.6509 -0.0332
$`2.A`
Call:
lm(formula = y ~ x, subset = strata == s & category == c)
Coefficients:
(Intercept) x
4.0964 -0.4536
$`1.B`
Call:
lm(formula = y ~ x, subset = strata == s & category == c)
Coefficients:
(Intercept) x
-1.8043 0.3353
$`2.B`
Call:
lm(formula = y ~ x, subset = strata == s & category == c)
Coefficients:
(Intercept) x
3.5787 -0.3562
Run Code Online (Sandbox Code Playgroud)