var*_*max 3 r lapply dplyr data.table
假设这是我的数据集
(dput)
dataset<-structure(list(group1 = structure(c(2L, 2L, 2L, 2L, 2L, 1L, 1L,
1L, 1L), .Label = c("b", "x"), class = "factor"), group2 = structure(c(2L,
2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("g", "y"), class = "factor"),
var1 = c(2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L)), .Names = c("group1",
"group2", "var1"), class = "data.frame", row.names = c(NA, -9L
))
Run Code Online (Sandbox Code Playgroud)
我需要计算两组频率
x+y
b+g
Run Code Online (Sandbox Code Playgroud)
对于变量var1,计算1值和2值的计数。对于每个组。所以所需的输出
total_count_of_group var1-1 var1-2
x y 5
3 2
b g 4 2 2
Run Code Online (Sandbox Code Playgroud)
此输出表示total_count_of_group x + y = 5 obs。由这个小组。其中1值满足3倍,而2值满足2倍。
相似total_count_of_group b + g = 4 obs。由这个小组。其中1值满足2倍,2值满足2倍。
如何得到这样的表?
这可以通过两个步骤解决:
dataset 使用data.table:
library(data.table)
dcast(setDT(dataset)[, total_count_of_group := .N, by =. (group1, group2)],
group1 + group2 + total_count_of_group~ paste0("var1=", var1), length)
Run Code Online (Sandbox Code Playgroud)
Run Code Online (Sandbox Code Playgroud)group1 group2 total_count_of_group var1_1 var1_2 1: b g 4 2 2 2: x y 5 3 2
请注意,这将适用于任意数量的不同值var1以及任意数量的组。