按新年拆分日期行

Hel*_*len 7 r

我有一家医院的数据,变量很多,每行的日期和日期也是如此,它告诉我们每行何时“有效”。每行最多可以有效一年。

test = data.frame(ID=c(10,10,10,12,12), Disease=c("P","P","P","D","P"), Pass=c("US","US","US","EN","EN"),
                  Payment=c(110,110,115,240,255), 
                  from_date=as.POSIXct(c("2008-01-09","2009-01-09","2010-01-09","2008-01-01","2013-12-31")),
                  to_date=as.POSIXct(c("2009-01-08","2010-01-08","2011-01-08","2008-12-31","2014-12-30"))
                  )
Run Code Online (Sandbox Code Playgroud)

对于从一年到另一年的行,我想对行进行拆分,以便最终得到两行而不是原始行,并且还要操纵from_date和to_date,从而最终得到一个新的数据集看起来像这样:

  test_desired = data.frame(ID=c(10,10,10,10,10,10,12,12,12), Disease=c("P","P","P","P","P","P","D","P","P"), Pass=c("US","US","US","US","US","US","EN","EN","EN"),
                              Payment=c(110,110,110,110,115,115,240,255,255), 
                              from_date=as.POSIXct(c("2008-01-09","2009-01-01","2009-01-09","2009-01-01","2010-01-09","2011-01-01","2008-01-01","2013-12-31","2014-01-01")),
                              to_date=as.POSIXct(c("2008-12-31","2009-01-08","2009-12-31","2010-01-08","2010-12-31","2011-01-08","2008-12-31","2013-12-31","2014-12-30"))
    )    
Run Code Online (Sandbox Code Playgroud)

尝试

library(lubridate) #for function "year" below
test_desired=test
row=c()
tmp=c()
for(i in 1:nrow(test_desired)){
  if(year(test_desired$from_date)[i]<year(test_desired$to_date)[i]){
    test_desired$to_date[i] = as.POSIXct(paste0(year(test_desired$from_date[i]),"-12-31"))
    row = test_desired[i,]
    row$from_date = as.POSIXct(paste0(year(test$to_date[i]),"-01-01"))
    row$to_date = test$to_date[i]
    tmp=rbind(tmp,row)

  } else next
}
test_desired=rbind(test_desired,tmp)
library(dplyr)
test_desired=arrange(test_desired,ID,from_date)
Run Code Online (Sandbox Code Playgroud)

有没有更优雅的方法可以做到这一点,例如使用dplyr?

小智 3

这是一个基于 tidyverse 的解决方案。它与 Lennyy 类似,但条件检查较少,并且添加时间没有问题(它们可能会显示在小标题中,但显示为00:00:00)。我添加是ungroup()因为听起来您在某处有一个分组变量(在 Lennyy 的解决方案下评论)。如果您不这样做,则可以将其删除:

library(dplyr)
library(lubridate)
library(purrr)

test %>% 
    ungroup() %>% # This isn't necessary if there are no groupings.
    split(rownames(test)) %>% 
    map_dfr(function(df){
        if (year(df$from_date) == year(df$to_date)) return(df)
        bind_rows(mutate(df, to_date = rollback(floor_date(to_date, "y"))),
                  mutate(df, from_date = floor_date(to_date, "y"))
                  )
    }
    )

#### OUTPUT ####

  ID Disease Pass Payment  from_date    to_date
1 10       P   US     110 2008-01-09 2008-12-31
2 10       P   US     110 2009-01-01 2009-01-08
3 10       P   US     110 2009-01-09 2009-12-31
4 10       P   US     110 2010-01-01 2010-01-08
5 10       P   US     115 2010-01-09 2010-12-31
6 10       P   US     115 2011-01-01 2011-01-08
7 12       D   EN     240 2008-01-01 2008-12-31
8 12       P   EN     255 2013-12-31 2013-12-31
9 12       P   EN     255 2014-01-01 2014-12-30
Run Code Online (Sandbox Code Playgroud)

解释一下:数据框被分成一个行列表。然后,我使用在包含不同年份的map_dfr每个数据帧上运行该函数。还将生成的数据帧绑定在一起。在匿名函数中,我按年份向下移动,然后对于第一行中的新值,我要么将其回滚到上个月的最后一天,要么对于第二行中的新值保留原样。from_dateto_datemap_dfrto_dateto_datefrom_date