对不起,标题含糊不清.另外,一个例子胜过千言万语.
我有一个清单:
> lst<-list(A=c("one","two", "three"), B=c("two", "four", "five"), C=c("six", "seven"), D=c("one", "five", "eight"))
> lst
$A
[1] "one" "two" "three"
$B
[1] "two" "four" "five"
$C
[1] "six" "seven"
$D
[1] "one" "five" "eight"
Run Code Online (Sandbox Code Playgroud)
我想重新排列成以下矩阵:
> m
A B C D
one 1 0 0 1
two 1 1 0 0
three 1 0 0 0
four 0 1 0 0
five 0 1 0 1
six 0 0 1 0
seven 0 0 1 0
eight 0 0 0 1
Run Code Online (Sandbox Code Playgroud)
其中,基本上,每个坐标表示每个列表元素中每个列表值的存在(1)或不存在(0).
我试着搞乱as.data.frame(),unlist(),table()和melt()的各种组合,没有成功,所以任何指向正确方向的人都会非常感激.
我想我的最后一招是一个嵌套循环,遍历列表元素,然后将0或1分配给矩阵中的相应坐标,但它看起来过于复杂.
for (...) {
for (...) {
if (...) {
var <- 1
} else {
var <- 0
}
}
}
Run Code Online (Sandbox Code Playgroud)
谢谢!
library(reshape2)
table(melt(lst))
# L1
#value A B C D
# one 1 0 0 1
# three 1 0 0 0
# two 1 1 0 0
# five 0 1 0 1
# four 0 1 0 0
# seven 0 0 1 0
# six 0 0 1 0
# eight 0 0 0 1
Run Code Online (Sandbox Code Playgroud)
这是一个相当手动的方法:
t(table(rep(names(lst), sapply(lst, length)), unlist(lst)))
#
# A B C D
# eight 0 0 0 1
# five 0 1 0 1
# four 0 1 0 0
# one 1 0 0 1
# seven 0 0 1 0
# six 0 0 1 0
# three 1 0 0 0
# two 1 1 0 0
Run Code Online (Sandbox Code Playgroud)
而且,stack
也有效!
table(stack(lst))
# ind
# values A B C D
# eight 0 0 0 1
# five 0 1 0 1
# four 0 1 0 0
# one 1 0 0 1
# seven 0 0 1 0
# six 0 0 1 0
# three 1 0 0 0
# two 1 1 0 0
Run Code Online (Sandbox Code Playgroud)
如果你关心行和列顺序,你可以factor
在使用前明确它们table
:
A <- stack(lst)
A$values <- factor(A$values,
levels=c("one", "two", "three", "four",
"five", "six", "seven", "eight"))
A$ind <- factor(A$ind, c("A", "B", "C", "D"))
table(A)
Run Code Online (Sandbox Code Playgroud)
因为基准很有趣......即使我们谈论的是微秒......去吧unlist
!
set.seed(1)
vec <- sample(3:10, 50, replace = TRUE)
lst <- lapply(vec, function(x) sample(letters, x))
names(lst) <- paste("A", sprintf("%02d", sequence(length(lst))), sep = "")
library(reshape2)
library(microbenchmark)
R2 <- function() table(melt(lst))
S <- function() table(stack(lst))
U <- function() t(table(rep(names(lst), sapply(lst, length)), unlist(lst, use.names=FALSE)))
microbenchmark(R2(), S(), U())
# Unit: microseconds
# expr min lq median uq max neval
# R2() 36836.579 37521.295 38053.9710 40213.829 45199.749 100
# S() 1427.830 1473.210 1531.9700 1565.345 3776.860 100
# U() 892.265 906.488 930.5575 945.326 1261.592 100
Run Code Online (Sandbox Code Playgroud)