将不规则的数据框收集到键值列中

Bac*_*lin 1 r dataframe tidyr

我最近发现了如何创建衣衫褴褛的数据帧使用的I功能,但其很难与它们集成tidyr,ggplot2以及Hadleyverse的其余部分.更具体地说,如何将包含命名向量的列收集到键值列中?

假设我创建了一个这样的数据框

make.vector <- function(length.out){
    x <- sample(9, length.out)
    names(x) <- switch(length.out,
        "Alice",
        c("Bob", "Charlie"),
        c("Dave", "Erin", "Frank"),
        c("Gwen", "Harold", "Inez", "James"))
    x
}
mydf <- data.frame(Game = gl(3, 3, labels=LETTERS[1:3]),
                   Set = rep(1:3, 3),
                   Score = I(lapply(rep(2:4, each=3), make.vector)))
Run Code Online (Sandbox Code Playgroud)

生产

> print(mydf)
  Game Set      Score
1    A   1       8, 3
2    A   2       2, 8
3    A   3       3, 8
4    B   1    1, 5, 4
5    B   2    2, 3, 5
6    B   3    2, 8, 5
7    C   1 7, 2, 3, 4
8    C   2 1, 6, 3, 7
9    C   3 6, 9, 3, 7
Run Code Online (Sandbox Code Playgroud)

所述数据帧可与被操纵dplyrtidyr在一个直接的方式,只要结果是预期的长度.

mydf %>%
    mutate(nPlayers = sapply(Score, length))
mydf %>% 
    group_by(Game) %>%
    summarize(TotalScore = list(Reduce("+", Score)))
Run Code Online (Sandbox Code Playgroud)

但是,我无法弄清楚如何为每个原始行创建多行结果.假设我想通过操作来创建以下数据框mydf:

   Game Set  Player Score
1     A   1     Bob     8
2     A   1 Charlie     3
3     A   2     Bob     2
4     A   2 Charlie     8
5     A   3     Bob     3
6     A   3 Charlie     8
7     B   1    Dave     1
8     B   1    Erin     5
9     B   1   Frank     4
10    B   2    Dave     2
...
Run Code Online (Sandbox Code Playgroud)

我知道这样做的唯一工具就是包的gather功能tidyr,但它似乎与非原子数据不太相配.

mydf %>%
    mutate(Player = lapply(Score, names)) %>%
    gather(P = Player, S = Score)
Run Code Online (Sandbox Code Playgroud)

我想我可以将解决方案合并在一起(就像之前类似的问题[1] [2]中所做的那样),

cbind(
    mydf[rep(1:nrow(mydf), sapply(mydf$Score, length)),
         c("Game", "Set")],
    data.frame(
        Player = unlist(lapply(mydf$Score, names)),
        Score = unlist(mydf$Score)
    )
)
Run Code Online (Sandbox Code Playgroud)

但我有一种感觉,如果回顾下周的代码,我将很难消化它.是否有"官方"或至少更聪明的方法来做到这一点?否则我将为它制作一般功能并添加到我的个人库中.

更新

根据大卫在下面的回答,我发现同样的结果也可以实现dplyr.

mydf %>%
    group_by(Game, Set) %>%
    do(with(., data.frame(Player = names(unlist(Score)), 
                          Score = unlist(Score))))

#    Game Set  Player Score
# 1     A   1     Bob     8
# 2     A   1 Charlie     6
# 3     A   2     Bob     7
# 4     A   2 Charlie     6
# 5     A   3     Bob     5
# 6     A   3 Charlie     8
# 7     B   1    Dave     1
# 8     B   1    Erin     9
# 9     B   1   Frank     3
# 10    B   2    Dave     8
# ..  ... ...     ...   ...
# Warning message:
# In rbind_all(out[[1]]) : Unequal factor levels: coercing to character
Run Code Online (Sandbox Code Playgroud)

Dav*_*urg 5

我会尝试通过小组使用来取消列出data.table.您可以同时内使用大括号临时变量保存它(如你在一个函数中做的),每组只运行一次该j次表达

library(data.table) 
setDT(mydf)[, {
               temp <- unlist(Score) 
               .(Player = names(temp), Score = temp)
              }, by = .(Game, Set)]

#     Game Set  Player Score
#  1:    A   1     Bob     2
#  2:    A   1 Charlie     9
#  3:    A   2     Bob     6
#  4:    A   2 Charlie     3
#  5:    A   3     Bob     2
#  6:    A   3 Charlie     8
#  7:    B   1    Dave     1
#  8:    B   1    Erin     6
#  9:    B   1   Frank     5
# 10:    B   2    Dave     3
#...
Run Code Online (Sandbox Code Playgroud)