通过滑动窗口计算事件的快速方法

use*_*361 4 performance loops r vector count

假设我有x = rnorm(100000),而不是做一个1000长度滑动窗口移动平均,我想做一个1000长度滑动窗口,它计算x上面的所有时间0.2

例如,

x <- rnorm(1004)
start <- 1:1000
record <- list()
while(start[length(start)] <= length(x)) {
    record[[length(record) + 1]] <- length(which(x[start] > 0.2))/length(start)
    start <- start + 1
    print(record[[length(record)]]);flush.console()
}
Run Code Online (Sandbox Code Playgroud)

这对于大型length(x). 什么是高效的方法?

Mar*_*gan 5

我的贡献是计算条件累积总和之间的滞后差

cumdiff = function(x) diff(c(0, cumsum( x > .2)), 20)
Run Code Online (Sandbox Code Playgroud)

连同

filt = function(x) filter(x > 0.2, rep(1, 20), sides=1)
library(TTR); ttr = function(x) runSum(x > .2, 20)
cumsub = function(x) { z <- cumsum(c(0, x>0.2)); tail(z,-20) - head(z,-20) }
Run Code Online (Sandbox Code Playgroud)

执行正常

> library(microbenchmark)
> set.seed(123); xx = rnorm(100000)
> microbenchmark(cumdiff(xx), filt(xx), ttr(xx), cumsub(xx))
Unit: milliseconds
        expr       min        lq    median       uq      max neval
 cumdiff(xx) 11.192005 12.387862 12.469253 12.77588 13.72404   100
    filt(xx) 20.979503 22.058045 22.442765 23.02612 62.91730   100
     ttr(xx)  8.390923 10.023934 10.119772 10.46309 11.04173   100
  cumsub(xx)  7.015654  8.483432  8.538171  8.73596  9.65421   100
Run Code Online (Sandbox Code Playgroud)

这些在如何表示结果的细节上有所不同(例如filtttr具有领先的 NA)并且仅filter处理嵌入式 NA

> xx[22] = NA
> head(cumdiff(xx))  # NA's propagate, silently
[1]  9  9 NA NA NA NA
> ttr(xx)
Error in runSum(x > 0.2, 20) : Series contains non-leading NAs
> tail(filt(xx), -19)
 [1]  9  9 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA  8  8  9
 ...
Run Code Online (Sandbox Code Playgroud)

  • 这甚至比 `ttr` 快 25%:`f &lt;- function(x) {z &lt;- cumsum(c(0, x&gt;0.2)); 尾(z,-20) - 头(z,-20)}` (2认同)