不规则时间序列的条件滚动均值(移动平均值)

use*_*045 9 r time-series moving-average

我有一组格式的数据:

ID    Minutes Value
xxxx  118     3 
xxxx  121     4 
xxxx  122     3 
yyyy  122     6 
xxxx  123     4 
yyyy  123     8 
...   ...     .... 
Run Code Online (Sandbox Code Playgroud)

每个ID都是患者,每个值都是,例如,该分钟的血压.我想在每个点之前60分钟和每个点之后60分钟创建一个滚动平均值.但是 - 你可以看到,有缺失分钟(所以我不能仅仅用行号),我想每一个唯一的ID(所以平均ID XXXX不能包括分配给ID YYYY值)创建平均值.听起来像rollapply或者rollingstat可能是选择,但是试图将它拼凑在一起却没什么成功......

如果需要进一步说明,请告诉我.

Ric*_*rta 11

您可以轻松填写​​缺失的分钟数(值将设置为NA),然后使用 rollapply

library(data.table)
library(zoo)

## Convert to data.table
DT <- data.table(DF, key=c("IDs", "Minutes"))

## Missing Minutes will be added in. Value will be set to NA. 
DT <- DT[CJ(unique(IDs), seq(min(Minutes), max(Minutes)))]

## Run your function
DT[, rollapply(value, 60, mean, na.rm=TRUE), by=IDs]
Run Code Online (Sandbox Code Playgroud)

或者,您不需要保留"填充"分钟/ NA值:

你可以一次完成所有这些:

## Convert your DF to a data.able
DT <- data.table(DF, key=c("IDs", "Minutes"))

## Compute rolling means, with on-the-fly padded minutes
DT[ CJ(unique(IDs), seq(min(Minutes), max(Minutes))) ][, 
  rollapply(value, 60, mean, na.rm=TRUE), by=IDs]
Run Code Online (Sandbox Code Playgroud)


dav*_*ers 5

另一种方法是使用tidyr/dplyr代替data.tableRcppRoll代替zoo

library(dplyr)
library(tidyr)
library(RcppRoll)

d %>% 
  group_by(ID) %>%
  # add rows for unosberved minutes
  complete(Minutes = full_seq(Minutes, 1)) %>%
  # RcppRoll::roll_mean() is written in C++ for speed 
  mutate(moving_mean = roll_mean(Value, 131, fill = NA, na.rm = TRUE)) %>%
  # keep only the rows that were originally observed
  filter(!is.na(Value))
Run Code Online (Sandbox Code Playgroud)

数据

d <- data_frame(
  ID = rep(1:3, each = 5),
  Minutes = rep(c(1, 30, 60, 120, 200), 3),
  Value = rpois(15, lambda = 10)
)
Run Code Online (Sandbox Code Playgroud)