itp*_*sen 6 r conditional-statements cumulative-sum dplyr
我正在尝试计算一个运行计数(即累积和),该计数以其他变量为条件,并且可以针对另一个变量的特定值进行重置。我正在 R 工作,dplyr如果可能的话,我更喜欢基于 - 的解决方案。
我想cumulative根据以下算法为运行计数创建一个变量:
cumulative组合内的运行计数 ( )idagecumulative) 加 1trialaccuracy = 0block = 2condition = 1cumulative) 重置为 0 ,并且下一个增量从 1 恢复(不是之前的数字)trialaccuracy = 1block = 2condition = 1trialwhere block != 2, or condition != 1,将运行计数 ( cumulative) 保留为NA这是一个最小的工作示例:
mydata <- data.frame(id = c(1,1,1,1,1,1,1,1,1,1,1),
age = c(1,1,1,1,1,1,1,1,1,1,2),
block = c(1,1,2,2,2,2,2,2,2,2,2),
trial = c(1,2,1,2,3,4,5,6,7,8,1),
condition = c(1,1,1,1,1,2,1,1,1,1,1),
accuracy = c(0,0,0,0,0,0,0,1,0,0,0)
)
id age block trial condition accuracy
1 1 1 1 1 0
1 1 1 2 1 0
1 1 2 1 1 0
1 1 2 2 1 0
1 1 2 3 1 0
1 1 2 4 2 0
1 1 2 5 1 0
1 1 2 6 1 1
1 1 2 7 1 0
1 1 2 8 1 0
1 2 2 1 1 0
Run Code Online (Sandbox Code Playgroud)
预期输出是:
id age block trial condition accuracy cumulative
1 1 1 1 1 0 NA
1 1 1 2 1 0 NA
1 1 2 1 1 0 1
1 1 2 2 1 0 2
1 1 2 3 1 0 3
1 1 2 4 2 0 NA
1 1 2 5 1 0 4
1 1 2 6 1 1 0
1 1 2 7 1 0 1
1 1 2 8 1 0 2
1 2 2 1 1 0 1
Run Code Online (Sandbox Code Playgroud)
我们可以case_when根据自己的情况来分配我们需要的值。然后,我们添加一个附加group_by条件,用于在列为 0cumsum时切换值。temp在最后mutate一步中,我们暂时replace NA将值设置temp为 0,然后接管cumsum它并将NA值再次放回原来的位置以获得最终输出。
library(dplyr)
mydata %>%
group_by(id, age) %>%
mutate(temp = case_when(accuracy == 0 & block == 2 & condition == 1 ~ 1,
accuracy == 1 & block == 2 & condition == 1 ~ 0,
TRUE ~ NA_real_)) %>%
ungroup() %>%
group_by(id, age, group = cumsum(replace(temp == 0, is.na(temp), 0))) %>%
mutate(cumulative = replace(cumsum(replace(temp, is.na(temp), 0)),
is.na(temp), NA)) %>%
select(-temp, -group)
# group id age block trial condition accuracy cumulative
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 0 1 1 1 1 1 0 NA
# 2 0 1 1 1 2 1 0 NA
# 3 0 1 1 2 1 1 0 1
# 4 0 1 1 2 2 1 0 2
# 5 0 1 1 2 3 1 0 3
# 6 0 1 1 2 4 2 0 NA
# 7 0 1 1 2 5 1 0 4
# 8 1 1 1 2 6 1 1 0
# 9 1 1 1 2 7 1 0 1
#10 1 1 1 2 8 1 0 2
#11 1 1 2 2 1 1 0 1
Run Code Online (Sandbox Code Playgroud)