我需要离开这个
id | date
-----------------
A | 2000-01-13
A | 2000-01-18
A | 2000-01-25
B | 2012-10-10
B | 2012-10-11
C | 2005-07-25
C | 2005-07-31
Run Code Online (Sandbox Code Playgroud)
对此
id | date | days from start
---------------------------
A | 2000-01-13 | 0
A | 2000-01-18 | 5
A | 2000-01-25 | 12
A | 2000-02-08 | 26
B | 2012-10-10 | 0
B | 2012-10-11 | 1
C | 2005-07-25 | 0
C | 2005-07-31 | 6
Run Code Online (Sandbox Code Playgroud)
即创建一个变量,它保存自第一个日期以来经过的天数,按id分组.
有任何想法吗?
Aru*_*run 10
使用data.table:(我假设date列是此处的字符.如果是date格式,则可以删除as.Date(.)函数调用.
df <- structure(list(id = c("A", "A", "A", "B", "B", "C", "C"),
date = c("2000-01-13", "2000-01-18", "2000-01-25", "2012-10-10",
"2012-10-11", "2005-07-25", "2005-07-31")),
.Names = c("id", "date"), row.names = c(NA, -7L),
class = "data.frame")
require(data.table)
dt <- data.table(df, key="id")
dt[, days_from_start := cumsum(c(0, diff(as.Date(date)))),by=id]
# id date days_from_start
# 1: A 2000-01-13 0
# 2: A 2000-01-18 5
# 3: A 2000-01-25 12
# 4: B 2012-10-10 0
# 5: B 2012-10-11 1
# 6: C 2005-07-25 0
# 7: C 2005-07-31 6
Run Code Online (Sandbox Code Playgroud)
您还可以使用的功能组合difftime和split:
dat
id date
1 A 2000-01-13
2 A 2000-01-18
3 A 2000-01-25
4 B 2012-10-10
5 B 2012-10-11
6 C 2005-07-25
7 C 2005-07-31
dat$date <- as.POSIXct(dat$date)
dat$"Days spent" <- unlist(lapply(split(dat,f=dat$id),
function(x){as.numeric(difftime(x$date,x$date[1], units="days"))}))
dat
id date Days spent
1 A 2000-01-13 0
2 A 2000-01-18 5
3 A 2000-01-25 12
4 B 2012-10-10 0
5 B 2012-10-11 1
6 C 2005-07-25 0
7 C 2005-07-31 6
Run Code Online (Sandbox Code Playgroud)
按照@agstudy和@Arun的建议,这可以简化如下:
dat$"Days spent" <- unlist(by(dat, dat$id,
function(x)difftime(x$date,x$date[1], units= "days")))
Run Code Online (Sandbox Code Playgroud)