如何计算从开始日期开始按组计算的天数?

jen*_*irf 5 aggregate r

我需要离开这个

 id  |    date
-----------------
  A  | 2000-01-13
  A  | 2000-01-18
  A  | 2000-01-25
  B  | 2012-10-10
  B  | 2012-10-11
  C  | 2005-07-25
  C  | 2005-07-31
Run Code Online (Sandbox Code Playgroud)

对此

 id  |    date     | days from start
---------------------------
  A  | 2000-01-13  |  0
  A  | 2000-01-18  |  5
  A  | 2000-01-25  |  12
  A  | 2000-02-08  |  26
  B  | 2012-10-10  |  0
  B  | 2012-10-11  |  1
  C  | 2005-07-25  |  0
  C  | 2005-07-31  |  6
Run Code Online (Sandbox Code Playgroud)

即创建一个变量,它保存自第一个日期以来经过的天数,按id分组.

有任何想法吗?

Aru*_*run 10

使用data.table:(我假设date列是此处的字符.如果是date格式,则可以删除as.Date(.)函数调用.

df <- structure(list(id = c("A", "A", "A", "B", "B", "C", "C"), 
             date = c("2000-01-13", "2000-01-18", "2000-01-25", "2012-10-10", 
                    "2012-10-11", "2005-07-25", "2005-07-31")), 
             .Names = c("id", "date"), row.names = c(NA, -7L), 
             class = "data.frame")
require(data.table)
dt <- data.table(df, key="id")
dt[, days_from_start := cumsum(c(0, diff(as.Date(date)))),by=id]

#    id       date days_from_start
# 1:  A 2000-01-13               0
# 2:  A 2000-01-18               5
# 3:  A 2000-01-25              12
# 4:  B 2012-10-10               0
# 5:  B 2012-10-11               1
# 6:  C 2005-07-25               0
# 7:  C 2005-07-31               6
Run Code Online (Sandbox Code Playgroud)

  • 即将发布完全相同的东西! (2认同)

pla*_*pus 5

您还可以使用的功能组合difftimesplit:

dat
  id       date
1  A 2000-01-13
2  A 2000-01-18
3  A 2000-01-25
4  B 2012-10-10
5  B 2012-10-11
6  C 2005-07-25
7  C 2005-07-31

dat$date <- as.POSIXct(dat$date)
dat$"Days spent" <- unlist(lapply(split(dat,f=dat$id),
                         function(x){as.numeric(difftime(x$date,x$date[1], units="days"))}))
dat
  id       date Days spent
1  A 2000-01-13          0
2  A 2000-01-18          5
3  A 2000-01-25         12
4  B 2012-10-10          0
5  B 2012-10-11          1
6  C 2005-07-25          0
7  C 2005-07-31          6
Run Code Online (Sandbox Code Playgroud)

按照@agstudy和@Arun的建议,这可以简化如下:

dat$"Days spent" <- unlist(by(dat, dat$id, 
                           function(x)difftime(x$date,x$date[1], units= "days")))
Run Code Online (Sandbox Code Playgroud)