用锥形窗口计算运行平均值

Adr*_*ins 6 r running-total mean zoo

给出(虚拟)向量

index=log(seq(10,20,by=0.5))
Run Code Online (Sandbox Code Playgroud)

我想计算具有居中窗口的运行平均值和每端的锥形窗口,即第一个条目保持不变,第二个是窗口大小的平均值3,依此类推,直到达到指定的窗口大小.

这里给出的答案:计算移动平均线,似乎都会产生一个较短的向量,切断窗口太大的起点和终点,例如:

ma <- function(x,n=5){filter(x,rep(1/n,n), sides=2)}

ma(index)

Time Series:
Start = 1 
End = 21 
Frequency = 1 
[1]       NA       NA 2.395822 2.440451 2.483165 2.524124 2.563466 2.601315
[9] 2.637779 2.672957 2.706937 2.739798 2.771611 2.802441 2.832347 2.861383
[17] 2.889599 2.917039 2.943746       NA       NA
Run Code Online (Sandbox Code Playgroud)

同样的

rollmean(index,5)
Run Code Online (Sandbox Code Playgroud)

来自动物园的包裹

有没有一种快速的方法来实现锥形窗口而不需要编写循环编码?

And*_*tar 8

由于rollapply可以说是相当缓慢的,它往往是值得写一个简单的定制功能......

tapermean <- function(x, width=5){
                 taper <- pmin(width,
                               2*(seq_along(x))-1,
                               2*rev(seq_along(x))-1) #set taper pattern
                 lower <- seq_along(x)-(taper-1)/2    #lower index for each mean
                 upper <- lower+taper                 #upper index for each mean
                 x <- c(0, cumsum(x))                 #sum x once
                 return((x[upper]-x[lower])/taper)}   #return means
Run Code Online (Sandbox Code Playgroud)

这比rollapply解决方案快200倍以上......

library(microbenchmark)
index <- log(seq(10,200,by=0.5)) #longer version for testing
w <- c(seq(1,5,2),rep(5,length(index)-5-1),seq(5,1,-2)) #as in Scarabees answer

microbenchmark(tapermean(index),
               rollapply(index,w,mean))

Unit: microseconds
                   expr       min         lq       mean     median        uq       max neval
       tapermean(index)   185.562   193.9405   246.4123   210.6965   284.548   590.197   100
rollapply(index,w,mean) 48213.027 49681.0715 52053.7352 50583.4320 51756.378 97187.538   100
Run Code Online (Sandbox Code Playgroud)

我休息一下!


Sca*_*bee 6

所述width的参数zoo::rollapply可以是一个数值向量.

因此,在您的示例中,您可以使用:

rollapply(index, c(1, 3, 5, rep(5, 15), 5, 3, 1), mean)
#  [1] 2.302585 2.350619 2.395822 2.440451 2.483165 2.524124 2.563466 2.601315 2.637779 2.672957 2.706937 2.739798 2.771611 2.802441 2.832347 2.861383
# [17] 2.889599 2.917039 2.943746 2.970195 2.995732
Run Code Online (Sandbox Code Playgroud)

如果n是一个奇数,一般的解决方案是:

w <- c(seq(1, n, 2), rep(n, length(index) - n - 1), seq(n, 1, -2))
rollapply(index, w, mean)
Run Code Online (Sandbox Code Playgroud)

编辑:如果您关心性能,可以使用自定义Rcpp功能:

library(Rcpp)

cppFunction("NumericVector fasttapermean(NumericVector x, const int window = 5) {
  const int n = x.size();
  NumericVector y(n);

  double s = x[0];
  int w = 1;

  for (int i = 0; i < n; i++) {
    y[i] = s/w;
    if (i < window/2) {
      s += x[i + (w+1)/2] + x[i + (w+3)/2];
      w += 2;
    } else if (i > n - window/2 - 2) {
      s -= x[i - (w-1)/2] + x[i - (w-3)/2];
      w -= 2;
    } else {
      s += x[i + (w+1)/2] - x[i - (w-1)/2];
    }
  }

  return y;
}")
Run Code Online (Sandbox Code Playgroud)

新基准:

n <- 5
index <- log(seq(10, 200, by = .5))
w <- c(seq(1, n, 2), rep(n, length(index) - n - 1), seq(n, 1, -2))

bench::mark(
  fasttapermean(index),
  tapermean(index),
  zoo::rollapply(index, w, mean)
)
# # A tibble: 3 x 14
#   expression                          min     mean   median      max `itr/sec` mem_alloc  n_gc n_itr total_time result      memory              time     gc
#   <chr>                          <bch:tm> <bch:tm> <bch:tm> <bch:tm>     <dbl> <bch:byt> <dbl> <int>   <bch:tm> <list>      <list>              <list>   <list>
# 1 fasttapermean(index)              4.7us   5.94us   5.56us   67.6us  168264.     5.52KB     0 10000     59.4ms <dbl [381]> <Rprofmem [2 x 3]>  <bch:tm> <tibble [10,000 x 3]>
# 2 tapermean(index)                 53.9us  79.68us  91.08us  405.8us   12550.    37.99KB     3  5951    474.2ms <dbl [381]> <Rprofmem [16 x 3]> <bch:tm> <tibble [5,954 x 3]>
# 3 zoo::rollapply(index, w, mean)   12.8ms  15.42ms  14.31ms   29.2ms      64.9  100.58KB     8    23    354.7ms <dbl [381]> <Rprofmem [44 x 3]> <bch:tm> <tibble [31 x 3]>
Run Code Online (Sandbox Code Playgroud)

但是,如果您关心(极端)精度,则应使用该rollapply方法,因为meanR 的内置算法比天真的求和除法更准确.

另请注意,该rollapply方法是唯一允许您na.rm = TRUE根据需要使用的方法.

  • 这种变化也可以起作用:`odd < - 2*seq_along(index) - 1; w < - pmin(odd,5,rev(odd))` (2认同)