为多个时间序列创建"昨天的价值"变量

Dan*_*wen 2 performance r time-series

我正在研究R中的一个项目,我有点卡住了.我有这种格式的四个时间序列:

x <- data.frame(Id = rep(c(1,2,3,4),2), 
                Date = c(rep("1980-01-01",4), rep("1980-01-02",4)),
                Freq = c(2,3,1,2,4,5,2,3))

ID        Date        Freq
1   1980 - 01 - 01      2
2   1980 - 01 - 01      3
3   1980 - 01 - 01      1
4   1980 - 01 - 01      2
1   1980 - 01 - 02      4
2   1980 - 01 - 02      5  
3   1980 - 01 - 02      2
4   1980 - 01 - 02      3
Run Code Online (Sandbox Code Playgroud)

我的目标是创建一个新变量,它只是昨天该组的freq值.

ID        Date        Freq   YestFreq
1   1980 - 01 - 01      2       NA
2   1980 - 01 - 01      3       NA
3   1980 - 01 - 01      1       NA
4   1980 - 01 - 01      2       NA 
1   1980 - 01 - 02      4       2
2   1980 - 01 - 02      5       3
3   1980 - 01 - 02      2       1
4   1980 - 01 - 02      3       2
Run Code Online (Sandbox Code Playgroud)

我尝试的解决方案是:

x$DateID = paste(x$ID, x$Date)
x$yesterday = as.Date(x$Date) - 1
x$YesterdayDateID = paste(x$ID, x$yesterday)

result = numeric(nrow(x))
for(i in 1:nrow(x)){
  answer = x$Freq[which(x$DateID == x$yesterdayDateID[i])]
  if(length(answer) != 0){result[i] = answer} else{result[i] = NA}
}
x = cbind(x, result)
Run Code Online (Sandbox Code Playgroud)

我的实际数据集有~600000行,(~300 Id和~2000个唯一日期),所以我的上述解决方案需要2个小时才能运行.任何帮助将不胜感激.

Pie*_*une 5

考虑到昨天可能存在的差距.我match用来识别前一天.从该索引然后按Id对目标列进行子集化:

data.table

library(data.table)
setDT(x)[, Date := as.IDate(Date)][
, YestFreq := Freq[match(Date-1L, Date)], by=Id][]
#   Id       Date Freq YestFreq
# 1:  1 1980-01-01    2       NA
# 2:  2 1980-01-01    3       NA
# 3:  3 1980-01-01    1       NA
# 4:  4 1980-01-01    2       NA
# 5:  1 1980-01-02    4        2
# 6:  2 1980-01-02    5        3
# 7:  3 1980-01-02    2        1
# 8:  4 1980-01-02    3        2
Run Code Online (Sandbox Code Playgroud)

dplyr

library(dplyr)
x$Date <- as.Date(x$Date)
x %>% group_by(Id) %>% mutate(YestFreq = Freq[match(Date - 1L, Date)])
#   Id       Date Freq YestFreq
# 1  1 1980-01-01    2       NA
# 2  2 1980-01-01    3       NA
# 3  3 1980-01-01    1       NA
# 4  4 1980-01-01    2       NA
# 5  1 1980-01-02    4        2
# 6  2 1980-01-02    5        3
# 7  3 1980-01-02    2        1
# 8  4 1980-01-02    3        2
Run Code Online (Sandbox Code Playgroud)