让我定义一个数据框,其中一列id由整数向量组成
df <- data.frame(id = c(1,2,2,3,3))
Run Code Online (Sandbox Code Playgroud)
和一列objects,而不是字符向量列表。让我们使用以下函数创建列
randomObjects <- function(argument) {
numberObjects <- sample(c(1,2,3,4), 1)
vector <- character()
for (i in 1:numberObjects) {
vector <- c(vector, sample(c("apple","pear","banana"), 1))
}
return(vector)
}
Run Code Online (Sandbox Code Playgroud)
然后用 lapply
set.seed(28100)
df$objects <- lapply(df$id, randomObjects)
Run Code Online (Sandbox Code Playgroud)
结果数据框是
df
# id objects
# 1 1 apple, apple
# 2 2 apple, banana, pear
# 3 2 banana
# 4 3 banana, pear, banana
# 5 3 pear, pear, apple, pear
Run Code Online (Sandbox Code Playgroud)
现在我想id用这样的数据框计算每个对象对应的对象数量
summary <- data.frame(id = c(1, 2, 3),
apples = c(2, 1, 1),
bananas = c(0, 2, 2),
pears = c(0, 1, 4))
summary
# id apples bananas pears
# 1 1 2 0 0
# 2 2 1 2 1
# 3 3 1 2 4
Run Code Online (Sandbox Code Playgroud)
我应该如何将 的信息折叠df成更紧凑的数据帧,例如summary不使用for循环?
library(plyr)
ddply(df, .(id), function(d, lev) {
x <- factor(unlist(d$objects), levels = lev)
t(as.matrix(table(x)))
}, lev = unique(unlist(df$objects)))
# id apple banana pear
#1 1 2 0 0
#2 2 1 2 1
#3 3 1 2 4
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
851 次 |
| 最近记录: |