How to get frequency counts using column breaks by row?

Question

How to get frequency counts using column breaks by row?

I have a data frame which tracks service involvement (srvc_inv {1, 0}) for individual x (Bob) over a timeframe of interest (years 1900-1999).

library(tidyverse)

dat <- data.frame(name = rep("Bob", 100),
              day = seq(as.Date("1900/1/1"), as.Date("1999/1/1"), "years"),
              srvc_inv = c(rep(0, 25), rep(1, 25), rep(0, 25), rep(1, 25)))

Run Code Online (Sandbox Code Playgroud)

As we can see, Bob has two service episodes: one episode between rows 26:50, and the other between rows 76:100.

If we want to determine any service involvement for Bob during the timeframe, we can use a simple max statement as shown below.

dat %>% 
  group_by(name) %>% 
  summarise(ever_inv = max(srvc_inv))

Run Code Online (Sandbox Code Playgroud)

However, I would like to determine the number of service episodes that Bob had during the timeframe of interest (in this case, 2). A distinct service episode would be identified by a break in service involvement over consecutive dates. Anybody have any idea how to program this? Thanks!

Answer 1

A. *_*man 4

另一种基于 R 基础的解决方案rle

library(dplyr)
dat %>% group_by(name) %>% 
        summarise(ever_inv = length(with(rle(srvc_inv), lengths[values==1])))

# A tibble: 1 x 2
name  ever_inv
  <fct>    <int>
1 Bob          2

Run Code Online (Sandbox Code Playgroud)

归档时间：	6 年前
查看次数：	144 次
最近记录：	6 年前