Adr*_*ins 6 r running-total mean zoo
给出(虚拟)向量
index=log(seq(10,20,by=0.5))
Run Code Online (Sandbox Code Playgroud)
我想计算具有居中窗口的运行平均值和每端的锥形窗口,即第一个条目保持不变,第二个是窗口大小的平均值3,依此类推,直到达到指定的窗口大小.
这里给出的答案:计算移动平均线,似乎都会产生一个较短的向量,切断窗口太大的起点和终点,例如:
ma <- function(x,n=5){filter(x,rep(1/n,n), sides=2)}
ma(index)
Time Series:
Start = 1
End = 21
Frequency = 1
[1] NA NA 2.395822 2.440451 2.483165 2.524124 2.563466 2.601315
[9] 2.637779 2.672957 2.706937 2.739798 2.771611 2.802441 2.832347 2.861383
[17] 2.889599 2.917039 2.943746 NA NA
Run Code Online (Sandbox Code Playgroud)
同样的
rollmean(index,5)
Run Code Online (Sandbox Code Playgroud)
来自动物园的包裹
有没有一种快速的方法来实现锥形窗口而不需要编写循环编码?
由于rollapply可以说是相当缓慢的,它往往是值得写一个简单的定制功能......
tapermean <- function(x, width=5){
taper <- pmin(width,
2*(seq_along(x))-1,
2*rev(seq_along(x))-1) #set taper pattern
lower <- seq_along(x)-(taper-1)/2 #lower index for each mean
upper <- lower+taper #upper index for each mean
x <- c(0, cumsum(x)) #sum x once
return((x[upper]-x[lower])/taper)} #return means
Run Code Online (Sandbox Code Playgroud)
这比rollapply解决方案快200倍以上......
library(microbenchmark)
index <- log(seq(10,200,by=0.5)) #longer version for testing
w <- c(seq(1,5,2),rep(5,length(index)-5-1),seq(5,1,-2)) #as in Scarabees answer
microbenchmark(tapermean(index),
rollapply(index,w,mean))
Unit: microseconds
expr min lq mean median uq max neval
tapermean(index) 185.562 193.9405 246.4123 210.6965 284.548 590.197 100
rollapply(index,w,mean) 48213.027 49681.0715 52053.7352 50583.4320 51756.378 97187.538 100
Run Code Online (Sandbox Code Playgroud)
我休息一下!
所述width的参数zoo::rollapply可以是一个数值向量.
因此,在您的示例中,您可以使用:
rollapply(index, c(1, 3, 5, rep(5, 15), 5, 3, 1), mean)
# [1] 2.302585 2.350619 2.395822 2.440451 2.483165 2.524124 2.563466 2.601315 2.637779 2.672957 2.706937 2.739798 2.771611 2.802441 2.832347 2.861383
# [17] 2.889599 2.917039 2.943746 2.970195 2.995732
Run Code Online (Sandbox Code Playgroud)
如果n是一个奇数,一般的解决方案是:
w <- c(seq(1, n, 2), rep(n, length(index) - n - 1), seq(n, 1, -2))
rollapply(index, w, mean)
Run Code Online (Sandbox Code Playgroud)
编辑:如果您关心性能,可以使用自定义Rcpp功能:
library(Rcpp)
cppFunction("NumericVector fasttapermean(NumericVector x, const int window = 5) {
const int n = x.size();
NumericVector y(n);
double s = x[0];
int w = 1;
for (int i = 0; i < n; i++) {
y[i] = s/w;
if (i < window/2) {
s += x[i + (w+1)/2] + x[i + (w+3)/2];
w += 2;
} else if (i > n - window/2 - 2) {
s -= x[i - (w-1)/2] + x[i - (w-3)/2];
w -= 2;
} else {
s += x[i + (w+1)/2] - x[i - (w-1)/2];
}
}
return y;
}")
Run Code Online (Sandbox Code Playgroud)
新基准:
n <- 5
index <- log(seq(10, 200, by = .5))
w <- c(seq(1, n, 2), rep(n, length(index) - n - 1), seq(n, 1, -2))
bench::mark(
fasttapermean(index),
tapermean(index),
zoo::rollapply(index, w, mean)
)
# # A tibble: 3 x 14
# expression min mean median max `itr/sec` mem_alloc n_gc n_itr total_time result memory time gc
# <chr> <bch:tm> <bch:tm> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <bch:tm> <list> <list> <list> <list>
# 1 fasttapermean(index) 4.7us 5.94us 5.56us 67.6us 168264. 5.52KB 0 10000 59.4ms <dbl [381]> <Rprofmem [2 x 3]> <bch:tm> <tibble [10,000 x 3]>
# 2 tapermean(index) 53.9us 79.68us 91.08us 405.8us 12550. 37.99KB 3 5951 474.2ms <dbl [381]> <Rprofmem [16 x 3]> <bch:tm> <tibble [5,954 x 3]>
# 3 zoo::rollapply(index, w, mean) 12.8ms 15.42ms 14.31ms 29.2ms 64.9 100.58KB 8 23 354.7ms <dbl [381]> <Rprofmem [44 x 3]> <bch:tm> <tibble [31 x 3]>
Run Code Online (Sandbox Code Playgroud)
但是,如果您关心(极端)精度,则应使用该rollapply方法,因为meanR 的内置算法比天真的求和除法更准确.
另请注意,该rollapply方法是唯一允许您na.rm = TRUE根据需要使用的方法.
| 归档时间: |
|
| 查看次数: |
397 次 |
| 最近记录: |