在data.frame中的列中解压缩和合并列表

Arm*_*min 14 r dataframe dplyr data.table

我有以下内容data.frame:

id     name   altNames
1001   Joan   character(0)      
1002   Jane   c("Janie", "Janet", "Jan")
1003   John   Jon
1004   Bill   Will
1005   Tom    character(0)      
Run Code Online (Sandbox Code Playgroud)

altNames可以为空(即字符(0)),只有一个名称或名称列表.我想要的是一个data.frame(或一个列表),其中每个条目name和/或altNames只出现一次与对应的id一样,如下所示:

id     name
1001   Joan
1002   Jane
1002   Janie
1002   Janet
1002   Jan
1003   John
1003   Jon
1004   Bill
1004   Will
1005   Tom
Run Code Online (Sandbox Code Playgroud)

这样做最有效的方法是什么?dplyr利用甚至更好.谢谢

编辑:这是数据:

df <- data_frame(
  id = c("1001", "1002","1003", "1004", "1005"), 
  name = c("Joan", "Jane", "John", "Bill", "Tom"), 
  altNames = list(character(0), c("Janie", "Janet", "Jan"), "Jon", "Will", character(0))
)
Run Code Online (Sandbox Code Playgroud)

Dav*_*urg 15

这是一种可能的data.table方法

library(data.table)
setDT(dat)[, .(name = c(name, unlist(altNames))), by = id]
#       id  name
#  1: 1001  Joan
#  2: 1002  Jane
#  3: 1002 Janie
#  4: 1002 Janet
#  5: 1002   Jan
#  6: 1003  John
#  7: 1003   Jon
#  8: 1004  Bill
#  9: 1004  Will
# 10: 1005   Tom
Run Code Online (Sandbox Code Playgroud)


pic*_*ick 10

基础R版本(使用df@rawr添加)

with(df, {
    ns <- mapply(c, name, altNames)
    data.frame(id = rep(id, times=lengths(ns)), name=unlist(ns), row.names=NULL)
})
#     id  name
#1  1001  Joan
#2  1002  Jane
#3  1002 Janie
#4  1002 Janet
#5  1002   Jan
#6  1003  John
#7  1003   Jon
#8  1004  Bill
#9  1004  Will
#10 1005   Tom
Run Code Online (Sandbox Code Playgroud)


had*_*ley 6

这是一个完整的dplyr + tidyr解决方案,我解决它的方式:

library(dplyr)
library(tidyr)

df <- data_frame(
  id = c("1001", "1002","1003", "1004", "1005"), 
  name = c("Joan", "Jane", "John", "Bill", "Tom"), 
  altNames = list(character(0), c("Janie", "Janet", "Jan"), "Jon", "Will", character(0))
)

# Need some way to concatenate a list of vectors with a vectors
# in a "rowwise" way
vector_c <- function(...) {
  Map(c, ...)
}

df %>% 
  mutate(
    names = vector_c(name, altNames),
    altNames = NULL,
    name = NULL
  ) %>% 
  unnest(names)
#> Source: local data frame [10 x 2]
#> 
#>      id names
#> 1  1001  Joan
#> 2  1002  Jane
#> 3  1002 Janie
#> 4  1002 Janet
#> 5  1002   Jan
#> 6  1003  John
#> 7  1003   Jon
#> 8  1004  Bill
#> 9  1004  Will
#> 10 1005   Tom
Run Code Online (Sandbox Code Playgroud)

大多数艰苦的工作都是通过tidyr::unnest()以下方式完成的:它设计为使用列表列来获取数据框并将其删除,并根据需要重复其他列.