计算从数据帧的上一组中添加和删除的新元素

Nev*_*nar 6 r dataframe dplyr

我有一个数据框

df <- data.frame(
  "Quarter" = c("Q1 2019","Q1 2019","Q1 2019","Q2 2019","Q2 2019","Q2 2019","Q2 2019","Q3 2019","Q3 2019","Q3 2019","Q3 2019","Q4 2019","Q4 2019"),
  "Name" = c("Ram","John","Jack","Ram","Rach","Will","John","Ram","Rach","Will","John","Rach","John"),
  stringsAsFactors = FALSE
) 
Run Code Online (Sandbox Code Playgroud)

我需要通过与上一季度的比较来计算每个季度添加和离开的人数。

预期的输出是

quarterYear status Count
1    Q1 2019 Added   3
2    Q1 2019 Left    0
3    Q2 2019 Added   2
4    Q2 2019 Left    1
5    Q3 2019 Added   0
6    Q3 2019 Left    0
7    Q4 2019 Added   0
8    Q4 2019 Left    2 
Run Code Online (Sandbox Code Playgroud)

我不确定如何比较两组并获得计数。

如何在 R 中实现预期输出?

the*_*ail 5

不确定速度的影响,但其中很大一部分本质上是比较连续计数,所以diff想到了。

tab <- table(df$Quarter, df$Name)
tab <- rbind(tab[1,,drop=FALSE], diff(tab))
out <- rbind(added = rowSums(tab == 1), left = rowSums(tab == -1))

#      Q1 2019 Q2 2019 Q3 2019 Q4 2019
#added       3       2       0       0
#left        0       1       0       2
Run Code Online (Sandbox Code Playgroud)

如果您特别需要长输出:

setNames(data.frame(as.table(out)), c("status","quarter","count"))
#  status quarter count
#1  added Q1 2019     3
#2   left Q1 2019     0
#3  added Q2 2019     2
#4   left Q2 2019     1
#5  added Q3 2019     0
#6   left Q3 2019     0
#7  added Q4 2019     0
#8   left Q4 2019     2
Run Code Online (Sandbox Code Playgroud)


Bas*_*bo1 1

The following works by first turning the Name column into a list of names by Quarter, and then comparing every Quarter with the previous Quarter using purrr::map2_int. Finally, the two columns that were added, Added and Left are pivoted into long form using tidyr::pivot_longer.

library(tidyverse)

df %>%
  group_by(Quarter) %>%
  summarise(names = list(Name)) %>%
  mutate(Added = map2_int(names, lag(names, default = list(list())), ~ length(setdiff(.x, .y))),
         Left = map2_int(names, lag(names, default = list(list())), ~ length(setdiff(.y, .x)))) %>%
  pivot_longer(Added:Left, names_to = "status", values_to = "Count") %>%
  select(-names)
Run Code Online (Sandbox Code Playgroud)

Result:

# A tibble: 8 x 3
  Quarter status Count
  <chr>   <chr>  <int>
1 Q1 2019 Added      3
2 Q1 2019 Left       0
3 Q2 2019 Added      2
4 Q2 2019 Left       1
5 Q3 2019 Added      0
6 Q3 2019 Left       0
7 Q4 2019 Added      0
8 Q4 2019 Left       2
Run Code Online (Sandbox Code Playgroud)