R - 按组将日期范围扩展为面板数据

Ent*_*opy 2 r date seq

我有日期范围,由两个变量(idtype)分组,这两个变量当前存储在一个名为的数据框中data.我的目标是扩展日期范围,以便我在日期范围内每天都有一行,其中包括相同的idtype.

以下是重现数据框示例的代码段:

data <- structure(list(id = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2), type = c("a", 
"a", "b", "c", "b", "a", "c", "d", "e", "f"), from = structure(c(1235199600, 
1235545200, 1235545200, 1235631600, 1235631600, 1242712800, 1242712800, 
1243058400, 1243058400, 1243231200), class = c("POSIXct", "POSIXt"
), tzone = ""), to = structure(c(1235372400, 1235545200, 1235631600, 
1235890800, 1236236400, 1242712800, 1243058400, 1243231200, 1243144800, 
1243576800), class = c("POSIXct", "POSIXt"), tzone = "")), .Names = c("id", 
"type", "from", "to"), row.names = c(700L, 753L, 2941L, 2178L, 
 2959L, 679L, 2185L, 12L, 802L, 1796L), class = "data.frame")
Run Code Online (Sandbox Code Playgroud)

这是数据集的直观表示:

id  type  from        to
1   a     2009-02-21  2009-02-23
1   a     2009-02-25  2009-02-25
1   b     2009-02-25  2009-02-26
1   c     2009-02-25  2009-03-01
1   b     2009-05-26  2009-03-05
2   a     2009-05-26  2009-05-19
2   c     2009-05-19  2009-05-23
2   d     2009-05-19  2009-05-25
2   e     2009-05-23  2009-05-24
2   f     2009-05-25  2009-05-29
Run Code Online (Sandbox Code Playgroud)

以下是预期结果的直观表示:

id  type  date
1   a     2009-02-21
1   a     2009-02-22
1   a     2009-02-23
1   b     2009-02-25
1   b     2009-02-26
1   c     2009-02-26
1   c     2009-02-27
1   c     2009-02-28
1   c     2009-03-01
...
2   f     2009-05-25
2   f     2009-05-26
2   f     2009-05-27
2   f     2009-05-28
2   f     2009-05-29
Run Code Online (Sandbox Code Playgroud)

我发现了几个类似的帖子(链接链接),这些帖子对我有所帮助.我试图使用plyr解决方案:

data2 <- adply(data, 1, summarise, date = seq(data$from, data$to))[c('id', 'type')]
Run Code Online (Sandbox Code Playgroud)

但是,这会导致错误:

Error: 'from' must be of length 1
Run Code Online (Sandbox Code Playgroud)

我还尝试使用data.table解决方案:

data[, list(date = seq(from, to)), by = c('id', 'type')]
Run Code Online (Sandbox Code Playgroud)

但是,这给了我一个不同的错误:

Error in `[.data.frame`(data, , list(date = seq(from, to)), by = c("id",  : 
unused argument (by = c("id", "type"))
Run Code Online (Sandbox Code Playgroud)

任何关于如何解决这些错误(或使用不同的方法)的想法将不胜感激.

G. *_*eck 7

1)by这是一个使用byR基础的三行答案.首先我们将日期转换为"Date"课程给予data2.然后我们应用f哪个实际工作在每一行,最后我们rbind得到的行在一起:

data2 <- transform(data, from = as.Date(from), to = as.Date(to))

f <- function(x) with(x, data.frame(id, type, date = seq(from, to, by = "day")))
do.call("rbind", by(data, 1:nrow(data), f))
Run Code Online (Sandbox Code Playgroud)

2)data.table使用data2与data.table 相同的方法,我们这样做:

library(data.table)

dt <- data.table(data2)
dt[, list(id, type, date = seq(from, to, by = "day")), by = 1:nrow(dt)]
Run Code Online (Sandbox Code Playgroud)

2a)data.table或者替代地,这里dt来自(2)并且f来自(1):

dt[, f(.SD), by = 1:nrow(dt)]
Run Code Online (Sandbox Code Playgroud)

3)dplyr与dplyr它给出了一个警告,但在其他地方工作data2,并f从(1):

data2 %>% rowwise() %>% do(f(.))
Run Code Online (Sandbox Code Playgroud)

更新一些改进.