ℕʘʘ*_*ḆḽḘ 3 r lubridate dplyr data.table
请考虑以下数据帧
time <-c('2016-04-13 23:07:45','2016-04-13 23:07:50','2016-04-13 23:08:45','2016-04-13 23:08:45'
         ,'2016-04-13 23:08:45','2016-04-13 23:07:50','2016-04-13 23:07:51')
group <-c('A','A','A','B','B','B','B')
value<- c(5,10,2,2,NA,1,4)
df<-data.frame(time,group,value)
> df
                 time group value
1 2016-04-13 23:07:45     A     5
2 2016-04-13 23:07:50     A    10
3 2016-04-13 23:08:45     A     2
4 2016-04-13 23:08:45     B     2
5 2016-04-13 23:08:45     B    NA
6 2016-04-13 23:07:50     B     1
7 2016-04-13 23:07:51     B     4
我想重新取样该数据帧在5 seconds level- group level,并计算总和的value每个time-interval- group value.
间隔应在左侧关闭,在右侧打开.例如,第一行输出应该是
2016-04-13 23:07:45     A     5 因为前5秒的间隔是 [2016-04-13 23:07:45, 2016-04-13 23:07:50[
我如何能做到这一点在任何dplyr或data.table?我需要导入lubridate时间戳吗?
最新版本(1.9.8+)data.table:
library(data.table)
# convert to data.table, fix time, add future time
setDT(df)
df[, time := as.POSIXct(time)][, time.5s := time + 5]
# use non-equi join to filter on the required intervals and sum
df[, newval := df[df, on = .(group, time < time.5s, time >= time),
                  sum(value, na.rm = T), by = .EACHI]$V1]
df
#                  time group value             time.5s newval
#1: 2016-04-13 23:07:45     A     5 2016-04-13 23:07:50      5
#2: 2016-04-13 23:07:50     A    10 2016-04-13 23:07:55     10
#3: 2016-04-13 23:08:45     A     2 2016-04-13 23:08:50      2
#4: 2016-04-13 23:08:45     B     2 2016-04-13 23:08:50      2
#5: 2016-04-13 23:08:45     B    NA 2016-04-13 23:08:50      2
#6: 2016-04-13 23:07:50     B     1 2016-04-13 23:07:55      5
#7: 2016-04-13 23:07:51     B     4 2016-04-13 23:07:56      4