I have a data frame which tracks service involvement (srvc_inv {1, 0}) for individual x (Bob) over a timeframe of interest (years 1900-1999).
library(tidyverse)
dat <- data.frame(name = rep("Bob", 100),
day = seq(as.Date("1900/1/1"), as.Date("1999/1/1"), "years"),
srvc_inv = c(rep(0, 25), rep(1, 25), rep(0, 25), rep(1, 25)))
Run Code Online (Sandbox Code Playgroud)
As we can see, Bob has two service episodes: one episode between rows 26:50, and the other between rows 76:100.
If we want to determine any service involvement for Bob during the timeframe, we can use a simple max statement as shown below.
dat %>%
group_by(name) %>%
summarise(ever_inv = max(srvc_inv))
Run Code Online (Sandbox Code Playgroud)
However, I would like to determine the number of service episodes that Bob had during the timeframe of interest (in this case, 2). A distinct service episode would be identified by a break in service involvement over consecutive dates. Anybody have any idea how to program this? Thanks!
另一种基于 R 基础的解决方案rle
library(dplyr)
dat %>% group_by(name) %>%
summarise(ever_inv = length(with(rle(srvc_inv), lengths[values==1])))
# A tibble: 1 x 2
name ever_inv
<fct> <int>
1 Bob 2
Run Code Online (Sandbox Code Playgroud)