当我执行以下操作时:
data_control %>%
group_by(politics, partner_politics) %>%
summarize(pd_sent_amount = mean(as.numeric(pd_sent_amount)),
n = n(),
pd_sent_amount_sd = sd(as.numeric(pd_sent_amount), na.rm = T)
)
Run Code Online (Sandbox Code Playgroud)
我得到当前输出:
# A tibble: 4 x 5
# Groups: politics [?]
politics partner_politics pd_sent_amount n pd_sent_amount_sd
<fct> <fct> <dbl> <int> <dbl>
1 Democrat Democrat 0.598 76 NA
2 Democrat Republican 0.479 34 NA
3 Republican Democrat 0.404 34 NA
4 Republican Republican 0.404 70 NA
Run Code Online (Sandbox Code Playgroud)
不确定为什么要进行标准差计算,因为我可以按组手动计算它们,例如:
test = subset(data_control, politics == "Democrat" & partner_politics == "Democrat")
with(test, sd(pd_sent_amount) / sqrt(nrow(test)))
> with(test, sd(pd_sent_amount) / sqrt(nrow(test)))
[1] 0.05008275
Run Code Online (Sandbox Code Playgroud)
这是数据样本的副本:
structure(list(politics = structure(c(1L, 2L, 1L, 2L, 1L, 1L,
2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 1L,
1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L), .Label = c("Democrat", "Republican"
), class = "factor"), partner_politics = structure(c(2L, 1L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 1L), .Label = c("Democrat",
"Republican"), class = "factor"), pd_sent_amount = c(0.2, 0,
0.75, 0, 0, 0, 0, 0, 0.5, 0, 1, 0, 1, 0.5, 1, 1, 1, 0.5, 1, 0.5,
1, 1, 0.25, 0, 0, 0.25, 0, 0, 0.5, 1)), row.names = 5:34, class = "data.frame")
Run Code Online (Sandbox Code Playgroud)
的调用sd()引用了pd_sent_amount的就地突变(摘要)。为摘要列命名。
data_control %>%
group_by(politics, partner_politics) %>%
summarize(pd_sent_amount_mean = mean(as.numeric(pd_sent_amount)),
n = n(),
pd_sent_amount_sd = sd(as.numeric(pd_sent_amount), na.rm = T)
)
Run Code Online (Sandbox Code Playgroud)
dplyr网站上此处给出的第四个示例提到“新创建的摘要会立即覆盖现有变量”,并且该示例实际上与您的情况相同,只是依次调用mean()和sd()。
| 归档时间: |
|
| 查看次数: |
37 次 |
| 最近记录: |