dat <- data.frame(A = c("r","t","y","g","r"),
B = c("g","r","r","t","y"),
C = c("t","g","t","r","t"))
A B C
1 r g t
2 t r g
3 y r t
4 g t r
5 r y t
Run Code Online (Sandbox Code Playgroud)
我想列出三列中一起出现的字符,忽略顺序。例如
Combinations Freq
r t g 3
y t r 2
Run Code Online (Sandbox Code Playgroud)
如果我想添加名义变量(例如性别)的频率计数,我该怎么做?
例如
dat <- data.frame(A = c("r","t","y","g","r"),
B = c("g","r","r","t","y"),
C = c("t","g","t","r","t"),
Gender = c("male", "female", "female", "male", "male"))
dat
A B C Gender
1 r g t male
2 t r g female
3 y r t female
4 g t r male
5 r y t male
Run Code Online (Sandbox Code Playgroud)
要得到这个:
Combinations Freq Male Female
r t g 3 2 1
y t r 2 1 1
Run Code Online (Sandbox Code Playgroud)
你可以做...
data.frame(table(combo = sapply(split(as.matrix(dat), row(dat)),
function(x) paste(sort(x), collapse=" "))))
combo Freq
1 g r t 3
2 r t y 2
Run Code Online (Sandbox Code Playgroud)
为了便于阅读,我建议多行和/或使用 magrittr 进行操作:
d = as.matrix(dat)
library(magrittr)
d %>% split(., row(.)) %>% sapply(
. %>% sort %>% paste(collapse = " ")
) %>% table(combo = .) %>% data.frame
combo Freq
1 g r t 3
2 r t y 2
Run Code Online (Sandbox Code Playgroud)
关于编辑/新问题,我会采取一些不同的方法,也许像......
# new example data
dat <- data.frame(A = c("r","t","y","g","r"), B = c("g","r","r","t","y"), C = c("t","g","t","r","t"),Gender = c("male", "female", "female", "male", "male"))
library(data.table)
setDT(dat)
dat[, combo := sapply(transpose(.SD),
. %>% sort %>% paste(collapse = " ")), .SDcols=A:C]
dat[, c(
n = .N,
Gender %>% factor(levels=c("male", "female")) %>% table %>% as.list
), by=combo]
combo n male female
1: g r t 3 2 1
2: r t y 2 1 1
Run Code Online (Sandbox Code Playgroud)