通过ID和时间在R中进行简单查找

Tim*_*cht 2 lookup r data.table

我有一个如下所示的数据集:

set.seed(1234)
DT<-data.table(id=c(rep(c("a","b","c","d"),5)),
           year=rep(seq(from = 2010.5,to=2012.5,by = .5),each=4),
           value=rnorm(20,10,1))
DT
     id   year     value
 1:  a 2010.5  8.792934
 2:  b 2010.5 10.277429
 3:  c 2010.5 11.084441
 4:  d 2010.5  7.654302
 5:  a 2011.0 10.429125
 6:  b 2011.0 10.506056
 7:  c 2011.0  9.425260
 8:  d 2011.0  9.453368
 9:  a 2011.5  9.435548
10:  b 2011.5  9.109962
11:  c 2011.5  9.522807
12:  d 2011.5  9.001614
13:  a 2012.0  9.223746
14:  b 2012.0 10.064459
15:  c 2012.0 10.959494
16:  d 2012.0  9.889715
17:  a 2012.5  9.488990
18:  b 2012.5  9.088805
19:  c 2012.5  9.162828
20:  d 2012.5 12.415835
Run Code Online (Sandbox Code Playgroud)

我想添加3个非常相似的列value_previous_6m,value_previous_y以及value_next_y每个ID.第10行应如下所示:

id   year    value value_previous_6m value_previous_y value_next_y
b  2011.5  9.109962    10.50606         10.27743        9.088805
Run Code Online (Sandbox Code Playgroud)

我想避免使用plyr函数,因为总数据集非常大.

蒂姆,非常感谢

编辑:我知道可以使用merge函数完成:

set.seed(1234)
DT<-data.table(id=c(rep(c("a","b","c","d"),5)),
           year=rep(seq(from = 2010.5,to=2012.5,by = .5),each=4),
           value=rnorm(20,10,1))
DT6mp <- copy(DT)
DT12mp <- copy(DT)
DT6mp[,year:=year-.5]
setkey(DT6mp,id,year);setkey(DT,id,year);setnames(DT6mp,"value","value6mp")
DT <- merge(DT,DT6mp,all.x=T,all.y=F,allow.cartesian=T)
DT12mp[,year:=year-1]
setkey(DT12mp,id,year);setkey(DT,id,year);setnames(DT12mp,"value","value12mp")
DT <- merge(DT,DT12mp,all.x=T,all.y=F,allow.cartesian=T)
DT
Run Code Online (Sandbox Code Playgroud)

但我认为应该有一个更好的方法.

Kha*_*haa 6

你可以使用shiftdevel版本,data.table它有两个laglead选项

library(data.table)#v >= 1.9.5
#library(devtools)
#install_github("Rdatatable/data.table", build_vignettes = FALSE)
DT[,c(paste0('val_previous_', c('6m', 'y')), "val_next_y"):=c(shift(value, 1:2), shift(value, 2, type="lead")), by=id]
 #   id   year     value val_previous_6m val_previous_y val_next_y
 #1:  a 2010.5  8.792934              NA             NA   9.435548
 #2:  b 2010.5 10.277429              NA             NA   9.109962
 #3:  c 2010.5 11.084441              NA             NA   9.522807
 #4:  d 2010.5  7.654302              NA             NA   9.001614
 #5:  a 2011.0 10.429125        8.792934             NA   9.223746
 #6:  b 2011.0 10.506056       10.277429             NA  10.064459
 #7:  c 2011.0  9.425260       11.084441             NA  10.959494
 #8:  d 2011.0  9.453368        7.654302             NA   9.889715
 #9:  a 2011.5  9.435548       10.429125       8.792934   9.488990
 #10:  b 2011.5  9.109962       10.506056      10.277429   9.088805
 #11:  c 2011.5  9.522807        9.425260      11.084441   9.162828
 #12:  d 2011.5  9.001614        9.453368       7.654302  12.415835
 #13:  a 2012.0  9.223746        9.435548      10.429125         NA
 #14:  b 2012.0 10.064459        9.109962      10.506056         NA
 #15:  c 2012.0 10.959494        9.522807       9.425260         NA
 #16:  d 2012.0  9.889715        9.001614       9.453368         NA
 #17:  a 2012.5  9.488990        9.223746       9.435548         NA
 #18:  b 2012.5  9.088805       10.064459       9.109962         NA
 #19:  c 2012.5  9.162828       10.959494       9.522807         NA
 #20:  d 2012.5 12.415835        9.889715       9.001614         NA
Run Code Online (Sandbox Code Playgroud)

故意长版以避免错误.

DT[, value_previous_6m := shift(value, 1), by=id
   ][, value_previous_y:= shift(value, 2), by=id
   ][, value_next_y:= shift(value, 2, type="lead"), by=id]
Run Code Online (Sandbox Code Playgroud)

  • @Tim_Utrecht函数`shift`可用于`data.table v1.9.5`.你可以从Github`库(devtools)安装它; install_github("Rdatatable/data.table",build_vignettes = FALSE)` (2认同)