很抱歉这个问题很长.我会尽力澄清我的目标
我想使用更新方法在data.table中添加虚拟对象,就像这个链接中已经回答的那样,但有点复杂.
为了更好的描述,我创建了数据.
DT <- data.table(UID = paste0("UID",rep(1:5,each=2)),
date = as.IDate(c("2012-01-01","2012-01-02","2012-01-03","2012-01-04","2012-01-05","2012-01-06","2012-02-01","2012-02-02","2012-02-03","2012-02-04")),
value = c(1:10))
Run Code Online (Sandbox Code Playgroud)
DT是一个data.table,包含UID,日期和值的信息.在原始数据中,结构是相同的,但具有较长的时间跨度(2年).
在这里,我想根据日期添加假人.
日期有几个特殊的时间跨度,我们可以用假期代表它们.
例如,在我上面创建的假数据中.
有两个假期
我想添加两种类型的假人
预期的结果是这样的:
UID Date Val D_length_2 D_length_4 UID1 1/1/2012 1 FALSE FALSE UID2 1/2/2012 2 FALSE TRUE UID3 1/3/2012 3 FALSE TRUE UID4 1/4/2012 4 FALSE TRUE UID5 1/5/2012 5 FALSE TRUE UID1 1/6/2012 6 FALSE FALSE UID2 2/1/2012 7 TRUE FALSE UID3 2/2/2012 8 TRUE FALSE UID4 2/3/2012 9 FALSE FALSE UID5 2/4/2012 10 FALSE FALSE
UID Date Val Before After UID1 1/1/2012 1 TRUE FALSE UID2 1/2/2012 2 FALSE FALSE UID3 1/3/2012 3 FALSE FALSE UID4 1/4/2012 4 FALSE FALSE UID5 1/5/2012 5 FALSE FALSE UID1 1/6/2012 6 FALSE TRUE UID2 2/1/2012 7 TRUE FALSE UID3 2/2/2012 8 FALSE FALSE UID4 2/3/2012 9 FALSE FALSE UID5 2/4/2012 10 FALSE TRUE
所以期望的结果总和是这样的
UID Date Val Before After D_length_2 D_length_4 UID1 1/1/2012 1 TRUE FALSE FALSE FALSE UID2 1/2/2012 2 FALSE FALSE FALSE TRUE UID3 1/3/2012 3 FALSE FALSE FALSE TRUE UID4 1/4/2012 4 FALSE FALSE FALSE TRUE UID5 1/5/2012 5 FALSE FALSE FALSE TRUE UID1 1/6/2012 6 FALSE TRUE FALSE FALSE UID2 2/1/2012 7 TRUE FALSE FALSE FALSE UID3 2/2/2012 8 FALSE FALSE TRUE FALSE UID4 2/3/2012 9 FALSE FALSE TRUE FALSE UID5 2/4/2012 10 FALSE TRUE FALSE FALSE
总观测值超过10M行,大约10个不同的假期和4个不同的长度.
对于第二种类型的假人,我想
f <- function(x){
ifelse(x %in% as.Date(c("2012-01-02","2012-02-02")) - 1, return(TRUE), return(FALSE))
}
DT[,Before:= f(date)]
Run Code Online (Sandbox Code Playgroud)
但似乎不正确.
对于第一个,我没有想出一个好的解决方案.
这个问题是关于data.table中的更新,非常欢迎任何有关如何处理它以及如何编写更新函数的想法!
这是一个开始:
library(data.table)
DT <- data.table(UID = paste0("UID",rep(1:5,each=2)),
date = as.IDate(c("2012-01-01","2012-01-02","2012-01-03","2012-01-04","2012-01-05","2012-01-06","2012-02-01","2012-02-02","2012-02-03","2012-02-04")),
value = c(1:10))
setkey(DT, date)
vacStart <- data.table(start = as.IDate(c("2012-01-02", "2012-02-02")), key="start")
vacEnd <- data.table(date = as.IDate(c("2012-01-05", "2012-02-03")), key="date")
#identify vacations:
vacStart[, Start:=.I]
DT <- vacStart[DT, roll=TRUE]
vacEnd[, End:=.I]
DT <- vacEnd[DT, roll=-Inf]
DT[,vac:=(End==Start)*Start]
DT[is.na(vac), vac:=0L]
#2-day vacations:
DT[,length_2 := (.N==2) & vac!=0, by=vac]
#days before vacation
DT[,before := c(diff(vac)>0, FALSE) & vac==0]
# date End Start UID value vac length_2 before
# 1: 2012-01-01 1 NA UID1 1 0 FALSE TRUE
# 2: 2012-01-02 1 1 UID1 2 1 FALSE FALSE
# 3: 2012-01-03 1 1 UID2 3 1 FALSE FALSE
# 4: 2012-01-04 1 1 UID2 4 1 FALSE FALSE
# 5: 2012-01-05 1 1 UID3 5 1 FALSE FALSE
# 6: 2012-01-06 2 1 UID3 6 0 FALSE FALSE
# 7: 2012-02-01 2 1 UID4 7 0 FALSE TRUE
# 8: 2012-02-02 2 2 UID4 8 2 TRUE FALSE
# 9: 2012-02-03 2 2 UID5 9 2 TRUE FALSE
# 10: 2012-02-04 NA 2 UID5 10 0 FALSE FALSE
Run Code Online (Sandbox Code Playgroud)