考虑以下内容data.table:
DT <- data.table(year = c(2011,2012,2013,2011,2012,2013,2011,2012,2013),
level = c(137,137,137,136,136,136,135,135,135),
valueIn = c(13,30,56,11,25,60,8,27,51))
Run Code Online (Sandbox Code Playgroud)
我想要以下输出:
DT <- data.table(year = c(2011,2012,2013,2011,2012,2013,2011,2012,2013),
level = c(137,137,137,136,136,136,135,135,135),
valueIn = c(13,30,56, 11,25,60, 8,27,51),
valueOut = c(12,27.5,58, 9.5,26,55.5, NA,NA,NA))
Run Code Online (Sandbox Code Playgroud)
换句话说,我要计算操作(valueIn[level] - valueIn[level-1]) / 2,根据year.例如,第一个值的计算如下:(13+11)/2=12.
目前,我使用for循环执行此操作,其中我data.table为每个循环创建子集level:
levelDtList <- list()
levels <- sort(DT$level, decreasing = FALSE)
for (this.level in levels) {
levelDt <- DT[level == this.level]
if (this.level == min(levels)) {
valueOut <- NA
} else {
levelM1Data <- levelDtList[[this.level - 1]]
valueOut <- (levelDt$valueIn + levelM1Data$valueIn) / 2
}
levelDt$valueOut <- valueOut
levelDtList[[this.level]] <- levelDt
}
datatable <- rbindlist(levelDtList)
Run Code Online (Sandbox Code Playgroud)
这很丑陋而且很慢,所以我正在寻找一种更好,更快,更有效data.table的解决方案.
使用shift-function with type = 'lead'获取下一个值,求和除以2:
DT[, valueOut := (valueIn + shift(valueIn, type = 'lead'))/2, by = year]
Run Code Online (Sandbox Code Playgroud)
你得到:
year level valueIn valueOut
1: 2011 137 13 12.0
2: 2012 137 30 27.5
3: 2013 137 56 58.0
4: 2011 136 11 9.5
5: 2012 136 25 26.0
6: 2013 136 60 55.5
7: 2011 135 8 NA
8: 2012 135 27 NA
9: 2013 135 51 NA
Run Code Online (Sandbox Code Playgroud)
使用shift-function指定的所有参数:
DT[, valueOut := (valueIn + shift(valueIn, n = 1L, fill = NA, type = 'lead'))/2, by = year]
Run Code Online (Sandbox Code Playgroud)