use*_*393 5 r window count dplyr
我试图计算每月独特的"新"用户数.New是一个之前没有出现的用户(从一开始)我也在尝试计算上个月没有出现的唯一用户数.
原始数据看起来像
library(dplyr)
date <- c("2010-01-10","2010-02-13","2010-03-22","2010-01-11","2010-02-14","2010-03-23","2010-01-12","2010-02-14","2010-03-24")
mth <- rep(c("2010-01","2010-02","2010-03"),3)
user <- c("123","129","145","123","129","180","180","184","145")
dt <- data.frame(date,mth,user)
dt <- dt %>% arrange(date)
dt
date mth user
1 2010-01-10 2010-01 123
2 2010-01-11 2010-01 123
3 2010-01-12 2010-01 180
4 2010-02-13 2010-02 129
5 2010-02-14 2010-02 129
6 2010-02-14 2010-02 184
7 2010-03-22 2010-03 145
8 2010-03-23 2010-03 180
9 2010-03-24 2010-03 145
Run Code Online (Sandbox Code Playgroud)
答案应该是这样的
new <- c(2,2,2,2,2,2,1,1,1)
totNew <- c(2,2,2,4,4,4,5,5,5)
notLastMonth <- c(2,2,2,2,2,2,2,2,2)
tmp <- cbind(dt,new,totNew,notLastMonth)
tmp
date mth user new totNew notLastMonth
1 2010-01-10 2010-01 123 2 2 2
2 2010-01-11 2010-01 123 2 2 2
3 2010-01-12 2010-01 180 2 2 2
4 2010-02-13 2010-02 129 2 4 2
5 2010-02-14 2010-02 129 2 4 2
6 2010-02-14 2010-02 184 2 4 2
7 2010-03-22 2010-03 145 1 5 2
8 2010-03-23 2010-03 180 1 5 2
9 2010-03-24 2010-03 145 1 5 2
Run Code Online (Sandbox Code Playgroud)
这是一次尝试(代码正文中的解释)
dt %>%
group_by(user) %>%
mutate(Count = row_number()) %>% # Count appearances per user
group_by(mth) %>%
mutate(new = sum(Count == 1)) %>% # Count first appearances per months
summarise(new = first(new), # Summarise new users per month (for cumsum)
users = list(unique(user))) %>% # Create a list of unique users per month (for notLastMonth)
mutate(totNew = cumsum(new), # Calculate overall cummulative sum of unique users
notLastMonth = lengths(Map(setdiff, users, lag(users)))) %>% # Compare new users to previous month
select(-users) %>%
right_join(dt) # Join back to the real data
# A tibble: 9 × 6
# mth new totNew notLastMonth date user
# <fctr> <int> <int> <int> <fctr> <fctr>
# 1 2010-01 2 2 2 2010-01-10 123
# 2 2010-01 2 2 2 2010-01-11 123
# 3 2010-01 2 2 2 2010-01-12 180
# 4 2010-02 2 4 2 2010-02-13 129
# 5 2010-02 2 4 2 2010-02-14 129
# 6 2010-02 2 4 2 2010-02-14 184
# 7 2010-03 1 5 2 2010-03-22 145
# 8 2010-03 1 5 2 2010-03-23 180
# 9 2010-03 1 5 2 2010-03-24 145
Run Code Online (Sandbox Code Playgroud)