我在 R 中的 dplyr 结构上遇到了一些困难。我想连续按两个不同的因子级别进行分组,以获得另一个变量的总和。
这是一个可重现的示例
df <- data.frame(c("A", "A", "A", "B", "C", "C","C"),
c("1", "1", "3", "2", "3", "2","2"),
c(12, 45, 78, 32, 5, 7, 8))
colnames(df) <- c("factor1","factor2","values")
Run Code Online (Sandbox Code Playgroud)
这是我迄今为止的尝试
test <- df %>%
group_by(factor1, factor2) %>%
summarise(sum(values))
# A tibble: 5 x 3
# Groups: factor1 [3]
factor1 factor2 `sum(values)`
<fct> <fct> <dbl>
1 A 1 57
2 A 3 78
3 B 2 32
4 C 2 15
5 C 3 5
Run Code Online (Sandbox Code Playgroud)
但这不是我要找的。我希望每个因子 1 有一行,结果如下所示(0 也占了)
1 2 3
A 57 0 78
B 0 32 0
C 0 15 5
Run Code Online (Sandbox Code Playgroud)
有什么建议么?
使用pivot_Wider-
tidyr::pivot_wider(df, names_from = factor2, values_from = values,
values_fn =sum, values_fill = 0)
# factor1 `1` `3` `2`
# <chr> <dbl> <dbl> <dbl>
#1 A 57 78 0
#2 B 0 0 32
#3 C 0 5 15
Run Code Online (Sandbox Code Playgroud)
或者在data.table-
library(data.table)
dcast(setDT(df),factor1~factor2, value.var = 'values', fun.aggregate = sum)
Run Code Online (Sandbox Code Playgroud)