所以,我正在使用一个数据框,每天有444天的数据.我有几个变量,我想滞后用于回归模型(lm).我想每次滞后7次.我目前正在产生这样的滞后......
email_data$email_reach1 <- lag(ts(email_data$email_reach, start = 1, end = 444), 1)
email_data$email_reach2 <- lag(ts(email_data$email_reach, start = 1, end = 444), 2)
email_data$email_reach3 <- lag(ts(email_data$email_reach, start = 1, end = 444), 3)
email_data$email_reach4 <- lag(ts(email_data$email_reach, start = 1, end = 444), 4)
email_data$email_reach5 <- lag(ts(email_data$email_reach, start = 1, end = 444), 5)
email_data$email_reach6 <- lag(ts(email_data$email_reach, start = 1, end = 444), 6)
email_data$email_reach7 <- lag(ts(email_data$email_reach, start = 1, end = 444), 7)
Run Code Online (Sandbox Code Playgroud)
然后,我为每个我想要滞后的变量重复这个.
这似乎是实现这一目标的可怕方式.还有更好的东西吗?
我已经考虑过滞后整个数据帧,但是我不知道如何为结果分配变量名并将其合并回原始数据帧.
你也可以使用data.table.(HT到@akrun)
set.seed(1)
email_data <- data.frame(dates=1:10, email_reach=rbinom(10, 10, 0.5))
library(data.table)
setDT(email_data)[, paste0('email_reach', 1:3) := shift(email_reach, 1:3)][]
# dates email_reach email_reach1 email_reach2 email_reach3
# 1: 1 4 NA NA NA
# 2: 2 4 4 NA NA
# 3: 3 5 4 4 NA
# 4: 4 7 5 4 4
# 5: 5 4 7 5 4
# 6: 6 7 4 7 5
# 7: 7 7 7 4 7
# 8: 8 6 7 7 4
# 9: 9 6 6 7 7
#10: 10 3 6 6 7
Run Code Online (Sandbox Code Playgroud)
我认为对于任何给定的n.
n <- 7
for (i in 1:n) {
email_data[[paste0("email_reach", i)]] <- lag(ts(email_data$email_reach, start = 1, end = 444), i)
}
Run Code Online (Sandbox Code Playgroud)