获取R中列的频率

Sto*_*oof 3 r

我在这种格式的数据框中有数据:

  grp1 grp2 grp3 grp4 result
1    0    1    0    0      1
2    1    0    0    0      0
3    0    0    0    1      1
4    0    0    0    1      1
5    1    0    0    0      0
6    0    1    0    0      1
.
.
.
Run Code Online (Sandbox Code Playgroud)

哪个可以生成

set.seed(13)

groups <- c("grp1", "grp2", "grp3", "grp4", "result")

# Randomly assign each to group and a result
x <- do.call(rbind, lapply(1:50, function(x) c(sample(c(1,0,0,0), 4), sample(0:1, 1))))
df <- data.frame(x)
colnames(df) <- groups
Run Code Online (Sandbox Code Playgroud)

我的目标是将数据格式化为:

  group      freq
1  grp1 0.5625000
2  grp2 0.5000000
3  grp3 0.6250000
4  grp4 0.2857143
Run Code Online (Sandbox Code Playgroud)

频率是具有结果的每个组的百分比.

到目前为止我使用dplyr的尝试:

library(dplyr)

df %>% 
  group_by(grp1, grp2, grp3, grp4, result) %>% 
  summarize(n = n()) %>% 
  mutate(freq = n / sum(n)) %>%
  select(-n) %>%
  filter(result == 1)
Run Code Online (Sandbox Code Playgroud)

结果是

  grp1 grp2 grp3 grp4 result      freq
1    0    0    0    1      1 0.5625000
2    0    0    1    0      1 0.5000000
3    0    1    0    0      1 0.6250000
4    1    0    0    0      1 0.2857143
Run Code Online (Sandbox Code Playgroud)

Dav*_*urg 6

这是一次data.table尝试

library(data.table)
melt(setDT(df), "result")[, .(freq = sum(value[result == 1])/sum(value)), by = variable]
#    variable      freq
# 1:     grp1 0.2857143
# 2:     grp2 0.6250000
# 3:     grp3 0.5000000
# 4:     grp4 0.5625000
Run Code Online (Sandbox Code Playgroud)