Arm*_*min 14 r dataframe dplyr data.table
我有以下内容data.frame:
id name altNames
1001 Joan character(0)
1002 Jane c("Janie", "Janet", "Jan")
1003 John Jon
1004 Bill Will
1005 Tom character(0)
Run Code Online (Sandbox Code Playgroud)
列altNames可以为空(即字符(0)),只有一个名称或名称列表.我想要的是一个data.frame(或一个列表),其中每个条目name和/或altNames只出现一次与对应的id一样,如下所示:
id name
1001 Joan
1002 Jane
1002 Janie
1002 Janet
1002 Jan
1003 John
1003 Jon
1004 Bill
1004 Will
1005 Tom
Run Code Online (Sandbox Code Playgroud)
这样做最有效的方法是什么?dplyr利用甚至更好.谢谢
编辑:这是数据:
df <- data_frame(
id = c("1001", "1002","1003", "1004", "1005"),
name = c("Joan", "Jane", "John", "Bill", "Tom"),
altNames = list(character(0), c("Janie", "Janet", "Jan"), "Jon", "Will", character(0))
)
Run Code Online (Sandbox Code Playgroud)
Dav*_*urg 15
这是一种可能的data.table方法
library(data.table)
setDT(dat)[, .(name = c(name, unlist(altNames))), by = id]
# id name
# 1: 1001 Joan
# 2: 1002 Jane
# 3: 1002 Janie
# 4: 1002 Janet
# 5: 1002 Jan
# 6: 1003 John
# 7: 1003 Jon
# 8: 1004 Bill
# 9: 1004 Will
# 10: 1005 Tom
Run Code Online (Sandbox Code Playgroud)
pic*_*ick 10
基础R版本(使用df@rawr添加)
with(df, {
ns <- mapply(c, name, altNames)
data.frame(id = rep(id, times=lengths(ns)), name=unlist(ns), row.names=NULL)
})
# id name
#1 1001 Joan
#2 1002 Jane
#3 1002 Janie
#4 1002 Janet
#5 1002 Jan
#6 1003 John
#7 1003 Jon
#8 1004 Bill
#9 1004 Will
#10 1005 Tom
Run Code Online (Sandbox Code Playgroud)
这是一个完整的dplyr + tidyr解决方案,我解决它的方式:
library(dplyr)
library(tidyr)
df <- data_frame(
id = c("1001", "1002","1003", "1004", "1005"),
name = c("Joan", "Jane", "John", "Bill", "Tom"),
altNames = list(character(0), c("Janie", "Janet", "Jan"), "Jon", "Will", character(0))
)
# Need some way to concatenate a list of vectors with a vectors
# in a "rowwise" way
vector_c <- function(...) {
Map(c, ...)
}
df %>%
mutate(
names = vector_c(name, altNames),
altNames = NULL,
name = NULL
) %>%
unnest(names)
#> Source: local data frame [10 x 2]
#>
#> id names
#> 1 1001 Joan
#> 2 1002 Jane
#> 3 1002 Janie
#> 4 1002 Janet
#> 5 1002 Jan
#> 6 1003 John
#> 7 1003 Jon
#> 8 1004 Bill
#> 9 1004 Will
#> 10 1005 Tom
Run Code Online (Sandbox Code Playgroud)
大多数艰苦的工作都是通过tidyr::unnest()以下方式完成的:它设计为使用列表列来获取数据框并将其删除,并根据需要重复其他列.