我在这种格式的数据框中有数据:
grp1 grp2 grp3 grp4 result
1 0 1 0 0 1
2 1 0 0 0 0
3 0 0 0 1 1
4 0 0 0 1 1
5 1 0 0 0 0
6 0 1 0 0 1
.
.
.
Run Code Online (Sandbox Code Playgroud)
哪个可以生成
set.seed(13)
groups <- c("grp1", "grp2", "grp3", "grp4", "result")
# Randomly assign each to group and a result
x <- do.call(rbind, lapply(1:50, function(x) c(sample(c(1,0,0,0), 4), sample(0:1, 1))))
df <- data.frame(x)
colnames(df) <- groups
Run Code Online (Sandbox Code Playgroud)
我的目标是将数据格式化为:
group freq
1 grp1 0.5625000
2 grp2 0.5000000
3 grp3 0.6250000
4 grp4 0.2857143
Run Code Online (Sandbox Code Playgroud)
频率是具有结果的每个组的百分比.
到目前为止我使用dplyr的尝试:
library(dplyr)
df %>%
group_by(grp1, grp2, grp3, grp4, result) %>%
summarize(n = n()) %>%
mutate(freq = n / sum(n)) %>%
select(-n) %>%
filter(result == 1)
Run Code Online (Sandbox Code Playgroud)
结果是
grp1 grp2 grp3 grp4 result freq
1 0 0 0 1 1 0.5625000
2 0 0 1 0 1 0.5000000
3 0 1 0 0 1 0.6250000
4 1 0 0 0 1 0.2857143
Run Code Online (Sandbox Code Playgroud)
这是一次data.table尝试
library(data.table)
melt(setDT(df), "result")[, .(freq = sum(value[result == 1])/sum(value)), by = variable]
# variable freq
# 1: grp1 0.2857143
# 2: grp2 0.6250000
# 3: grp3 0.5000000
# 4: grp4 0.5625000
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
155 次 |
| 最近记录: |