计算自上次事件以来经过的时间

Ent*_*opy 4 time if-statement r dplyr

我有一个包含多个主题(id)的数据框,重复观察(有时记录time).每个时间可以或可以不与事件(event)相关联.可以使用以下命令生成示例数据框:

set.seed(12345)
id <- c(rep(1, 9), rep(2, 9), rep(3, 9))
time <- c(seq(from = 0, to = 96, by = 12),
      seq(from = 0, to = 80, by = 10),
      seq(from = 0, to = 112, by = 14))
random <- runif(n = 27)
event <- rep(100, 27)

df <- data.frame(cbind(id, time, event, random))
df$event <- ifelse(df$random < 0.55, 0, df$event)
df <- subset(df, select = -c(random))
df$event <- ifelse(df$time == 0, 100, df$event)
Run Code Online (Sandbox Code Playgroud)

我想计算事件之间的时间(tae[最后一次事件之后的时间]),这样理想输出看起来像:

head(ideal_df)
  id time event tae
1  1    0   100   0
2  1   12   100   0
3  1   24   100   0
4  1   36   100   0
5  1   48     0  12
6  1   60     0  24
Run Code Online (Sandbox Code Playgroud)

在fortran中,我使用以下代码来创建tae变量:

IF(EVENT.GT.0) THEN
  TEVENT = TIME
  TAE = 0
ENDIF

IF(EVENT.EQ.0) THEN
  TAE = TIME - TEVENT
ENDIF
Run Code Online (Sandbox Code Playgroud)

在R中,我尝试了ifelsedplyr解决方案.但是,都不会产生我想要的输出.

# Calculate the time since last event (using ifelse)
df$tae <- ifelse(df$event >= 0, df$tevent = df$time & df$tae = 0, df$tae = df$time - df$tevent)

Error: unexpected '=' in "df$tae <- ifelse(df$event >= 0, df$tevent ="

# Calculate the time since last event (using dplyr)
res <- df %>%
  arrange(id, time) %>%
  group_by(id) %>%
  mutate(tae = time - lag(time))
res 

   id time event tae
1   1    0   100  NA
2   1   12   100  12
3   1   24   100  12
4   1   36   100  12
5   1   48     0  12
6   1   60     0  12
Run Code Online (Sandbox Code Playgroud)

显然,这些都不能产生我想要的输出.似乎ifelseR中不能容忍在函数内分配变量.我对dplyr解决方案的尝试也无法解释event变量......

最后,需要记录下一个事件之前的时间的另一个变量tue.如果有人碰巧考虑如何最好地进行这个(也许是更棘手的)计算,请随时分享.

任何有关如何获得其中一个工作(或替代解决方案)的想法将不胜感激.谢谢!

PS - 当事件之间的间隔在a内变化时的可重现示例ID如下所示:

id <- rep(1, 9)
time <- c(0, 10, 22, 33, 45, 57, 66, 79, 92)
event <- c(100, 0, 0, 100, 0, 100, 0, 0, 100)
df <- data.frame(cbind(id, time, event))

head(df)
  id time event
1  1    0   100
2  1   10     0
3  1   22     0
4  1   33   100
5  1   45     0
6  1   57   100
Run Code Online (Sandbox Code Playgroud)

Sve*_*ein 8

这是一种方法dplyr:

library(dplyr)
df %>%
  mutate(tmpG = cumsum(c(FALSE, as.logical(diff(event))))) %>%
  group_by(id) %>%
  mutate(tmp_a = c(0, diff(time)) * !event,
         tmp_b = c(diff(time), 0) * !event) %>%
  group_by(tmpG) %>%
  mutate(tae = cumsum(tmp_a),
         tbe = rev(cumsum(rev(tmp_b)))) %>%
  ungroup() %>%
  select(-c(tmp_a, tmp_b, tmpG))
Run Code Online (Sandbox Code Playgroud)

新列包括event(tae)之后的时间和event(tbe)之前的时间.

结果:

   id time event tae tbe
1   1    0   100   0   0
2   1   12   100   0   0
3   1   24   100   0   0
4   1   36   100   0   0
5   1   48     0  12  48
6   1   60     0  24  36
7   1   72     0  36  24
8   1   84     0  48  12
9   1   96   100   0   0
10  2    0   100   0   0
11  2   12     0  12  24
12  2   24     0  24  12
13  2   36   100   0   0
14  2   48     0  12  48
15  2   60     0  24  36
16  2   72     0  36  24
17  2   84     0  48  12
18  2   96     0  60   0
19  3    0   100   0   0
20  3   12   100   0   0
21  3   24     0  12  24
22  3   36     0  24  12
23  3   48   100   0   0
24  3   60   100   0   0
25  3   72   100   0   0
26  3   84     0  12  12
27  3   96   100   0   0
Run Code Online (Sandbox Code Playgroud)

结果与第二个例子:

  id time event tae tbe
1  1    0   100   0   0
2  1   10     0  10  23
3  1   22     0  22  11
4  1   33   100   0   0
5  1   45     0  12  12
6  1   57   100   0   0
7  1   66     0   9  26
8  1   79     0  22  13
9  1   92   100   0   0
Run Code Online (Sandbox Code Playgroud)

  • 这个非常好的解决方案+1.如果你想删除临时变量`tmp2`,你应该在`select(-tmp,-tmp2)之前插入`ungroup()` (2认同)