我有一个数据框
df <- data.frame(
"Quarter" = c("Q1 2019","Q1 2019","Q1 2019","Q2 2019","Q2 2019","Q2 2019","Q2 2019","Q3 2019","Q3 2019","Q3 2019","Q3 2019","Q4 2019","Q4 2019"),
"Name" = c("Ram","John","Jack","Ram","Rach","Will","John","Ram","Rach","Will","John","Rach","John"),
stringsAsFactors = FALSE
)
Run Code Online (Sandbox Code Playgroud)
我需要通过与上一季度的比较来计算每个季度添加和离开的人数。
预期的输出是
quarterYear status Count
1 Q1 2019 Added 3
2 Q1 2019 Left 0
3 Q2 2019 Added 2
4 Q2 2019 Left 1
5 Q3 2019 Added 0
6 Q3 2019 Left 0
7 Q4 2019 Added 0
8 Q4 2019 Left 2
Run Code Online (Sandbox Code Playgroud)
我不确定如何比较两组并获得计数。
如何在 R 中实现预期输出?
不确定速度的影响,但其中很大一部分本质上是比较连续计数,所以diff想到了。
tab <- table(df$Quarter, df$Name)
tab <- rbind(tab[1,,drop=FALSE], diff(tab))
out <- rbind(added = rowSums(tab == 1), left = rowSums(tab == -1))
# Q1 2019 Q2 2019 Q3 2019 Q4 2019
#added 3 2 0 0
#left 0 1 0 2
Run Code Online (Sandbox Code Playgroud)
如果您特别需要长输出:
setNames(data.frame(as.table(out)), c("status","quarter","count"))
# status quarter count
#1 added Q1 2019 3
#2 left Q1 2019 0
#3 added Q2 2019 2
#4 left Q2 2019 1
#5 added Q3 2019 0
#6 left Q3 2019 0
#7 added Q4 2019 0
#8 left Q4 2019 2
Run Code Online (Sandbox Code Playgroud)
The following works by first turning the Name column into a list of names by Quarter, and then comparing every Quarter with the previous Quarter using purrr::map2_int.
Finally, the two columns that were added, Added and Left are pivoted into long form using tidyr::pivot_longer.
library(tidyverse)
df %>%
group_by(Quarter) %>%
summarise(names = list(Name)) %>%
mutate(Added = map2_int(names, lag(names, default = list(list())), ~ length(setdiff(.x, .y))),
Left = map2_int(names, lag(names, default = list(list())), ~ length(setdiff(.y, .x)))) %>%
pivot_longer(Added:Left, names_to = "status", values_to = "Count") %>%
select(-names)
Run Code Online (Sandbox Code Playgroud)
Result:
# A tibble: 8 x 3
Quarter status Count
<chr> <chr> <int>
1 Q1 2019 Added 3
2 Q1 2019 Left 0
3 Q2 2019 Added 2
4 Q2 2019 Left 1
5 Q3 2019 Added 0
6 Q3 2019 Left 0
7 Q4 2019 Added 0
8 Q4 2019 Left 2
Run Code Online (Sandbox Code Playgroud)