在data.table中创建多个lead变量

Whi*_*ard 1 r data.table

这个问题类似于一次在data.table中创建一堆滞后变量以及如何在每个组中创建滞后变量?,但据我所知,并不完全相同.

我想创造一些领先的变量,例如lead1,lead2lead3下面,通过分组groups.

示例数据

require(data.table)
set.seed(1)
data <- data.table(time =c(1:10,1:8),groups = c(rep(c("a","b"),c(10,8))), 
   value = rnorm(18))
data
    time groups       value
 1:    1      a -0.62645381
 2:    2      a  0.18364332
 3:    3      a -0.83562861
 4:    4      a  1.59528080
 5:    5      a  0.32950777
 6:    6      a -0.82046838
 7:    7      a  0.48742905
 8:    8      a  0.73832471
 9:    9      a  0.57578135
10:   10      a -0.30538839
11:    1      b  1.51178117
12:    2      b  0.38984324
13:    3      b -0.62124058
14:    4      b -2.21469989
15:    5      b  1.12493092
16:    6      b -0.04493361
17:    7      b -0.01619026
18:    8      b  0.94383621
Run Code Online (Sandbox Code Playgroud)

结果数据表应该是

   time groups       value       lead1       lead2       lead3
1     1      a -0.62645381  0.18364332 -0.83562861  1.59528080
2     2      a  0.18364332 -0.83562861  1.59528080  0.32950777
3     3      a -0.83562861  1.59528080  0.32950777 -0.82046838
4     4      a  1.59528080  0.32950777 -0.82046838  0.48742905
5     5      a  0.32950777 -0.82046838  0.48742905  0.73832471
6     6      a -0.82046838  0.48742905  0.73832471  0.57578135
7     7      a  0.48742905  0.73832471  0.57578135 -0.30538839
8     8      a  0.73832471  0.57578135 -0.30538839          NA
9     9      a  0.57578135 -0.30538839          NA          NA
10   10      a -0.30538839          NA          NA          NA
11    1      b  1.51178117  0.38984324 -0.62124058 -2.21469989
12    2      b  0.38984324 -0.62124058 -2.21469989  1.12493092 
13    3      b -0.62124058 -2.21469989  1.12493092 -0.04493361
14    4      b -2.21469989  1.12493092 -0.04493361 -0.01619026
15    5      b  1.12493092 -0.04493361 -0.01619026  0.94383621
16    6      b -0.04493361 -0.01619026  0.94383621          NA
17    7      b -0.01619026  0.94383621          NA          NA
18    8      b  0.94383621          NA          NA          NA
Run Code Online (Sandbox Code Playgroud)

请注意,我的实际数据集要大得多,我可能需要3个以上的主要变量.

我使用的是data.table1.9.4版,我不确定何时可以更新到最新版本,因此这个版本的解决方案将是一个奖励.对不起,这个额外的约束.

提前致谢.

Dav*_*urg 8

标准data.table方法是使用内置shift函数(如链接线程中已提到的那样).你需要CRAN上最新的稳定版本 - v 1.9.6+

library(data.table) # V1.9.6+
data[, paste0("lead", 1L:3L) := shift(value, 1L:3L, type = "lead"), by = groups]
data
#     time groups       value       lead1       lead2       lead3
#  1:    1      a -0.62645381  0.18364332 -0.83562861  1.59528080
#  2:    2      a  0.18364332 -0.83562861  1.59528080  0.32950777
#  3:    3      a -0.83562861  1.59528080  0.32950777 -0.82046838
#  4:    4      a  1.59528080  0.32950777 -0.82046838  0.48742905
#  5:    5      a  0.32950777 -0.82046838  0.48742905  0.73832471
#  6:    6      a -0.82046838  0.48742905  0.73832471  0.57578135
#  7:    7      a  0.48742905  0.73832471  0.57578135 -0.30538839
#  8:    8      a  0.73832471  0.57578135 -0.30538839          NA
#  9:    9      a  0.57578135 -0.30538839          NA          NA
# 10:   10      a -0.30538839          NA          NA          NA
# 11:    1      b  1.51178117  0.38984324 -0.62124058 -2.21469989
# 12:    2      b  0.38984324 -0.62124058 -2.21469989  1.12493092
# 13:    3      b -0.62124058 -2.21469989  1.12493092 -0.04493361
# 14:    4      b -2.21469989  1.12493092 -0.04493361 -0.01619026
# 15:    5      b  1.12493092 -0.04493361 -0.01619026  0.94383621
# 16:    6      b -0.04493361 -0.01619026  0.94383621          NA
# 17:    7      b -0.01619026  0.94383621          NA          NA
# 18:    8      b  0.94383621          NA          NA          NA
Run Code Online (Sandbox Code Playgroud)