ssa*_*san 3 r lubridate dplyr data.table
我有一个类似下面的数据.
id from data to date
1 2015-03-09 2015-03-14
2 2015-02-22 2015-02-24
2 2015-05-06 2015-05-17
3 2015-02-12 2015-02-16
4 2015-03-10 2015-03-16
4 2015-03-22 2015-04-07
4 2015-06-07 2015-07-07
4 2015-07-06 2015-07-07
4 2015-08-02 2015-08-07
Run Code Online (Sandbox Code Playgroud)
我想创建一个单独的变量,它是按ID分组的日期和下一个日期之间的差异.所以id的第一次将是NA.我尝试了基于stackoverflow中的另一个答案的以下方法,我无法实现.
library(data.table)
chf1 = data.table(id = chf$id,from date = chf$f.date,to_date = chf$t.date)
setkey(chf1,id)
chf1[,diff:=c(NA,difftime(from_date, to_date, units = "days")),by=id]
Run Code Online (Sandbox Code Playgroud)
输出看起来像
id from_date to_date difference
1 2015-03-09 2015-03-14 NA
2 2015-02-22 2015-02-24 NA
2 2015-05-06 2015-05-17 71
3 2015-02-12 2015-02-16 NA
4 2015-03-10 2015-03-16 NA
4 2015-03-22 2015-04-07 6
4 2015-06-07 2015-06-10 64
4 2015-07-06 2015-07-07 26
4 2015-08-02 2015-08-07 26
Run Code Online (Sandbox Code Playgroud)
代码中有三个问题
1)chf1$from_date,chf1$to_date得到整列,所以没有'id'分组的效果
2)difftime给出与初始列长度相同的输出.
3)由于difftime'from_date'的每个元素与'to_date'的对应元素之间存在差异,因此不需要by = id
因此,代码可以
chf1[, diff1:=difftime(from_date, to_date, units = "days")]
chf1
# id from_date to_date diff1
#1: 1 2015-03-09 2015-03-14 -5 days
##2: 2 2015-02-22 2015-02-24 -2 days
#3: 2 2015-05-06 2015-05-17 -11 days
#4: 3 2015-02-12 2015-02-16 -4 days
#5: 4 2015-03-10 2015-03-16 -6 days
#6: 4 2015-03-22 2015-04-07 -16 days
#7: 4 2015-06-07 2015-07-07 -30 days
#8: 4 2015-07-06 2015-07-07 -1 days
#9: 4 2015-08-02 2015-08-07 -5 days
Run Code Online (Sandbox Code Playgroud)
基于在OP代码的说明,如果我们需要得到的"身份证"分组后"FROM_DATE"的下一个值之间的差异,使用difftime上shift与的"TO_DATE" ED"FROM_DATE"和分配(:=它'DIFF1'.
chf1[, diff1 := difftime(shift(from_date, type = "lead"), to_date,
units = "days") , by = id]
chf1
# id from_date to_date diff1
#1: 1 2015-03-09 2015-03-14 NA days
#2: 2 2015-02-22 2015-02-24 71 days
#3: 2 2015-05-06 2015-05-17 NA days
#4: 3 2015-02-12 2015-02-16 NA days
#5: 4 2015-03-10 2015-03-16 6 days
#6: 4 2015-03-22 2015-04-07 61 days
#7: 4 2015-06-07 2015-07-07 -1 days
#8: 4 2015-07-06 2015-07-07 26 days
#9: 4 2015-08-02 2015-08-07 NA days
Run Code Online (Sandbox Code Playgroud)
或者可能是
chf1[, diff1 := difftime(from_date, shift(to_date), units = "days"), by = id]
Run Code Online (Sandbox Code Playgroud)
chf <- structure(list(id = c(1L, 2L, 2L, 3L, 4L, 4L, 4L, 4L, 4L),
f.date = structure(c(16503,
16488, 16561, 16478, 16504, 16516, 16593, 16622, 16649), class = "Date"),
t.date = structure(c(16508, 16490, 16572, 16482, 16510, 16532,
16623, 16623, 16654), class = "Date")), .Names = c("id",
"f.date", "t.date"), row.names = c(NA, -9L), class = "data.frame")
chf1 = data.table(id = chf$id,from_date = chf$f.date,to_date = chf$t.date)
Run Code Online (Sandbox Code Playgroud)