Ent*_*opy 4 time if-statement r dplyr
我有一个包含多个主题(id)的数据框,重复观察(有时记录time).每个时间可以或可以不与事件(event)相关联.可以使用以下命令生成示例数据框:
set.seed(12345)
id <- c(rep(1, 9), rep(2, 9), rep(3, 9))
time <- c(seq(from = 0, to = 96, by = 12),
seq(from = 0, to = 80, by = 10),
seq(from = 0, to = 112, by = 14))
random <- runif(n = 27)
event <- rep(100, 27)
df <- data.frame(cbind(id, time, event, random))
df$event <- ifelse(df$random < 0.55, 0, df$event)
df <- subset(df, select = -c(random))
df$event <- ifelse(df$time == 0, 100, df$event)
Run Code Online (Sandbox Code Playgroud)
我想计算事件之间的时间(tae[最后一次事件之后的时间]),这样理想输出看起来像:
head(ideal_df)
id time event tae
1 1 0 100 0
2 1 12 100 0
3 1 24 100 0
4 1 36 100 0
5 1 48 0 12
6 1 60 0 24
Run Code Online (Sandbox Code Playgroud)
在fortran中,我使用以下代码来创建tae变量:
IF(EVENT.GT.0) THEN
TEVENT = TIME
TAE = 0
ENDIF
IF(EVENT.EQ.0) THEN
TAE = TIME - TEVENT
ENDIF
Run Code Online (Sandbox Code Playgroud)
在R中,我尝试了ifelse和dplyr解决方案.但是,都不会产生我想要的输出.
# Calculate the time since last event (using ifelse)
df$tae <- ifelse(df$event >= 0, df$tevent = df$time & df$tae = 0, df$tae = df$time - df$tevent)
Error: unexpected '=' in "df$tae <- ifelse(df$event >= 0, df$tevent ="
# Calculate the time since last event (using dplyr)
res <- df %>%
arrange(id, time) %>%
group_by(id) %>%
mutate(tae = time - lag(time))
res
id time event tae
1 1 0 100 NA
2 1 12 100 12
3 1 24 100 12
4 1 36 100 12
5 1 48 0 12
6 1 60 0 12
Run Code Online (Sandbox Code Playgroud)
显然,这些都不能产生我想要的输出.似乎ifelseR中不能容忍在函数内分配变量.我对dplyr解决方案的尝试也无法解释event变量......
最后,需要记录下一个事件之前的时间的另一个变量tue.如果有人碰巧考虑如何最好地进行这个(也许是更棘手的)计算,请随时分享.
任何有关如何获得其中一个工作(或替代解决方案)的想法将不胜感激.谢谢!
PS - 当事件之间的间隔在a内变化时的可重现示例ID如下所示:
id <- rep(1, 9)
time <- c(0, 10, 22, 33, 45, 57, 66, 79, 92)
event <- c(100, 0, 0, 100, 0, 100, 0, 0, 100)
df <- data.frame(cbind(id, time, event))
head(df)
id time event
1 1 0 100
2 1 10 0
3 1 22 0
4 1 33 100
5 1 45 0
6 1 57 100
Run Code Online (Sandbox Code Playgroud)
这是一种方法dplyr:
library(dplyr)
df %>%
mutate(tmpG = cumsum(c(FALSE, as.logical(diff(event))))) %>%
group_by(id) %>%
mutate(tmp_a = c(0, diff(time)) * !event,
tmp_b = c(diff(time), 0) * !event) %>%
group_by(tmpG) %>%
mutate(tae = cumsum(tmp_a),
tbe = rev(cumsum(rev(tmp_b)))) %>%
ungroup() %>%
select(-c(tmp_a, tmp_b, tmpG))
Run Code Online (Sandbox Code Playgroud)
新列包括event(tae)之后的时间和event(tbe)之前的时间.
结果:
id time event tae tbe
1 1 0 100 0 0
2 1 12 100 0 0
3 1 24 100 0 0
4 1 36 100 0 0
5 1 48 0 12 48
6 1 60 0 24 36
7 1 72 0 36 24
8 1 84 0 48 12
9 1 96 100 0 0
10 2 0 100 0 0
11 2 12 0 12 24
12 2 24 0 24 12
13 2 36 100 0 0
14 2 48 0 12 48
15 2 60 0 24 36
16 2 72 0 36 24
17 2 84 0 48 12
18 2 96 0 60 0
19 3 0 100 0 0
20 3 12 100 0 0
21 3 24 0 12 24
22 3 36 0 24 12
23 3 48 100 0 0
24 3 60 100 0 0
25 3 72 100 0 0
26 3 84 0 12 12
27 3 96 100 0 0
Run Code Online (Sandbox Code Playgroud)
结果与第二个例子:
id time event tae tbe
1 1 0 100 0 0
2 1 10 0 10 23
3 1 22 0 22 11
4 1 33 100 0 0
5 1 45 0 12 12
6 1 57 100 0 0
7 1 66 0 9 26
8 1 79 0 22 13
9 1 92 100 0 0
Run Code Online (Sandbox Code Playgroud)