在向量中找到变化大于阈值的点

dww*_*dww 7 r

我想在向量中找到位置,其中值与向量中较早的点相差超过某个阈值.应该相对于矢量中的第一个值来测量第一个变化点.应相对于先前的变化点测量后续变化点.

我可以使用for循环来做到这一点,但我想知道是否有更惯用和更快的矢量化灵魂.

最小的例子:

set.seed(123)
x = cumsum(rnorm(500))

mindiff = 5.0
start = x[1]
changepoints = integer()

for (i in 1:length(x)) {
  if (abs(x[i] - start) > mindiff) {
    changepoints = c(changepoints, i)
    start = x[i]
  }
}

plot(x, type = 'l')
points(changepoints, x[changepoints], col='red')
Run Code Online (Sandbox Code Playgroud)

在此输入图像描述

d.b*_*d.b 3

在中实现相同的代码Rcpp可以帮助提高速度。

library(Rcpp)
cppFunction(
  "IntegerVector foo(NumericVector vect, double difference){
    int start = 0;
    IntegerVector changepoints;
    for (int i = 0; i < vect.size(); i++){
      if((vect[i] - vect[start]) > difference || (vect[start] - vect[i]) > difference){
        changepoints.push_back (i+1);
        start = i;        
      }
    }
    return(changepoints);
  }"
  )

foo(vect = x, difference = mindiff)
# [1]  17  25  56  98 108 144 288 297 307 312 403 470 487

identical(foo(vect = x, difference = mindiff), changepoints)
#[1] TRUE
Run Code Online (Sandbox Code Playgroud)

标杆管理

#DATA
set.seed(123)
x = cumsum(rnorm(1e5))
mindiff = 5.0

library(microbenchmark)
microbenchmark(baseR = {start = x[1]
changepoints = integer()

for (i in 1:length(x)) {
    if (abs(x[i] - start) > mindiff) {
        changepoints = c(changepoints, i)
        start = x[i]
    }
}}, Rcpp = foo(vect = x, difference = mindiff))
#Unit: milliseconds
#  expr        min        lq      mean    median        uq      max neval cld
# baseR 117.194668 123.07353 125.98741 125.56882 127.78463 139.5318   100   b
#  Rcpp   7.907011  11.93539  14.47328  12.16848  12.38791 263.2796   100  a 
Run Code Online (Sandbox Code Playgroud)