gor*_*ryh 1 group-by r frequency dplyr
我试图计算每个组中不同值的比例,但我不想为组创建“新”行,而是创建新列。
以上面第二个问题为例。如果我有以下数据:
data <- structure(list(value = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 3L, 3L, 3L, 3L), class = structure(c(1L, 1L, 1L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("A",
"B"), class = "factor")), .Names = c("value", "class"), class = "data.frame", row.names = c(NA,
-16L))
Run Code Online (Sandbox Code Playgroud)
我可以计算每个类 (A,B) 中每个值 (1,2,3) 的比例:
data %>%
group_by(value, class) %>%
summarise(n = n()) %>%
complete(class, fill = list(n = 0)) %>%
group_by(class) %>%
mutate(freq = n / sum(n))
# A tibble: 6 x 4
value class n freq
<int> <fctr> <dbl> <dbl>
1 1 A 3 0.2727273
2 1 B 3 0.6000000
3 2 A 4 0.3636364
4 2 B 2 0.4000000
5 3 A 4 0.3636364
6 3 B 0 0.0000000
Run Code Online (Sandbox Code Playgroud)
但是,我最终为每个值/类对添加了一行,而不是我想要这样的东西:
# some code
# A tibble: 6 x 4
class n 1 2 3
<fctr> <dbl> <dbl> <dbl> <dbl>
1 A 11 0.2727273 0.3636364 0.3636364
2 B 5 0.6000000 0.4000000 0.0000000
Run Code Online (Sandbox Code Playgroud)
每个组有一列。我可以编写 for 循环来从旧的数据框构建一个新的数据框,但我确信有更好的方法。有什么建议?
谢谢
我们可以pivot_wider在最后使用
library(dplyr)
library(tidyr)
data %>%
group_by(value, class) %>%
summarise(n = n()) %>%
complete(class, fill = list(n = 0)) %>%
group_by(class) %>%
mutate(freq = n / sum(n), n = sum(n)) %>%
pivot_wider(names_from = value, values_from = freq)
# A tibble: 2 x 5
# Groups: class [2]
# class n `1` `2` `3`
# <fct> <dbl> <dbl> <dbl> <dbl>
#1 A 11 0.273 0.364 0.364
#2 B 5 0.6 0.4 0
Run Code Online (Sandbox Code Playgroud)
或者正如@IcecreamToucan 提到的,complete不需要,因为pivot_wider可以选择填充自定义值(默认为 NA)
data %>%
group_by(value, class) %>%
summarise(n = n()) %>%
group_by(class) %>%
mutate(freq = n / sum(n), n = sum(n)) %>%
pivot_wider(names_from = value, values_from = freq, values_fill = list(freq = 0))
Run Code Online (Sandbox Code Playgroud)
如果我们使用的是以前版本的tidyr,则使用spread
data %>%
group_by(value, class) %>%
summarise(n = n()) %>%
complete(class, fill = list(n = 0)) %>%
group_by(class) %>%
mutate(freq = n / sum(n), n = sum(n)) %>%
spread(value, freq)
Run Code Online (Sandbox Code Playgroud)