我的问题涉及如何计算自R中发生的事件以来的天数.以下是数据的最小示例:
df <- data.frame(date=as.Date(c("06/07/2000","15/09/2000","15/10/2000","03/01/2001","17/03/2001","23/05/2001","26/08/2001"), "%d/%m/%Y"),
event=c(0,0,1,0,1,1,0))
date event
1 2000-07-06 0
2 2000-09-15 0
3 2000-10-15 1
4 2001-01-03 0
5 2001-03-17 1
6 2001-05-23 1
7 2001-08-26 0
Run Code Online (Sandbox Code Playgroud)
二进制变量(事件)的值为1,表示事件发生,否则为0.重复观察在不同时间完成(date)预期输出如下,自上次事件(tae)以来的日期:
date event tae
1 2000-07-06 0 NA
2 2000-09-15 0 NA
3 2000-10-15 1 0
4 2001-01-03 0 80
5 2001-03-17 1 153
6 2001-05-23 1 67
7 2001-08-26 0 95
Run Code Online (Sandbox Code Playgroud)
我一直在寻找类似问题的答案,但他们没有解决我的具体问题.我试图从类似的帖子(计算自上次事件以来经过的时间)实现想法,下面是我最接近解决方案:
library(dplyr)
df %>%
mutate(tmp_a = c(0, diff(date)) * !event,
tae = cumsum(tmp_a))
Run Code Online (Sandbox Code Playgroud)
这导致下面显示的输出不是预期的:
date event tmp_a tae
1 2000-07-06 0 0 0
2 2000-09-15 0 71 71
3 2000-10-15 1 0 71
4 2001-01-03 0 80 151
5 2001-03-17 1 0 151
6 2001-05-23 1 0 151
7 2001-08-26 0 95 246
Run Code Online (Sandbox Code Playgroud)
如何微调这个或不同的方法的任何帮助将不胜感激.
Nic*_*icE 10
你可以尝试这样的事情:
# make an index of the latest events
last_event_index <- cumsum(df$event) + 1
# shift it by one to the right
last_event_index <- c(1, last_event_index[1:length(last_event_index) - 1])
# get the dates of the events and index the vector with the last_event_index,
# added an NA as the first date because there was no event
last_event_date <- c(as.Date(NA), df[which(df$event==1), "date"])[last_event_index]
# substract the event's date with the date of the last event
df$tae <- df$date - last_event_date
df
# date event tae
#1 2000-07-06 0 NA days
#2 2000-09-15 0 NA days
#3 2000-10-15 1 NA days
#4 2001-01-03 0 80 days
#5 2001-03-17 1 153 days
#6 2001-05-23 1 67 days
#7 2001-08-26 0 95 days
Run Code Online (Sandbox Code Playgroud)