从最近21天窗口中提取事件类型

gib*_*z00 3 r zoo dplyr

我的数据框看起来像这样.最右边的两列是我想要的列.

**Name      ActivityType     ActivityDate   Email(last 21 says)  Webinar(last21)**             
John       Email            1/1/2014        NA                   NA   
John       Webinar          1/5/2014        NA                   NA
John       Sale             1/20/2014       Yes                  Yes
John       Webinar          3/25/2014       NA                   NA
John       Sale             4/1/2014        No                   Yes
John       Sale             7/1/2014        No                   No
Tom        Email            1/1/2015        NA                   NA   
Tom        Webinar          1/5/2015        NA                   NA
Tom        Sale             1/20/2015       Yes                  Yes
Tom        Webinar          3/25/2015       NA                   NA
Tom        Sale              4/1/2015        No                   Yes
Tom        Sale              7/1/2015        No                   No                
Run Code Online (Sandbox Code Playgroud)

我只是想创建一个是/否变量,表示在过去21天内每次"销售"交易是否有电子邮件或网络研讨会.我正在考虑(模拟代码)沿着这样使用dplyr的方式:

custlife %>% 
group_by(Name) %>% 
 mutate(Email(last21days)=lag(ifelse(ActivityType = "Email" & ActivityDate of email within (activity date of sale - 21),Yes,No)).
Run Code Online (Sandbox Code Playgroud)

我不确定实现这个的方法.请帮助.非常感谢您的帮助!

Dav*_*urg 5

这是一个可能的data.table解决方案.在这里,我创建了两个临时数据集 - 一个用于Sale其他活动类型,一个用于其余活动类型,然后通过滚动窗口21连接它们,同时使用by = .EACHI以检查每个连接中的条件.然后,我将结果加入原始数据集.

将日期列转换为Date类,并按名称和日期键入数据(用于最终/滚动连接)

library(data.table)
setkey(setDT(df)[, ActivityDate := as.IDate(ActivityDate, "%m/%d/%Y")], Name, ActivityDate)
Run Code Online (Sandbox Code Playgroud)

为每个活动创建2个临时数据集

Saletemp <- df[ActivityType == "Sale", .(Name, ActivityDate)]
Elsetemp <- df[ActivityType != "Sale", .(Name, ActivityDate, ActivityType)]
Run Code Online (Sandbox Code Playgroud)

在检查条件时,通过21的滚动窗口加入销售临时数据集

Saletemp[Elsetemp, `:=`(Email21 = as.logical(which(i.ActivityType == "Email")), 
                        Webinar21 = as.logical(which(i.ActivityType == "Webinar"))), 
         roll = -21, by = .EACHI]
Run Code Online (Sandbox Code Playgroud)

加入一切

df[Saletemp, `:=`(Email21 = i.Email21, Webinar21 = i.Webinar21)]
df
#     Name ActivityType ActivityDate Email21 Webinar21
#  1: John        Email   2014-01-01      NA        NA
#  2: John      Webinar   2014-01-05      NA        NA
#  3: John         Sale   2014-01-20    TRUE      TRUE
#  4: John      Webinar   2014-03-25      NA        NA
#  5: John         Sale   2014-04-01      NA      TRUE
#  6: John         Sale   2014-07-01      NA        NA
#  7:  Tom        Email   2015-01-01      NA        NA
#  8:  Tom      Webinar   2015-01-05      NA        NA
#  9:  Tom         Sale   2015-01-20    TRUE      TRUE
# 10:  Tom      Webinar   2015-03-25      NA        NA
# 11:  Tom         Sale   2015-04-01      NA      TRUE
# 12:  Tom         Sale   2015-07-01      NA        NA
Run Code Online (Sandbox Code Playgroud)