R:数值向量的条件求和

Ant*_*tti 7 loops r

我有矢量具有数值.例如:

inVector <- c(2, -10, 5, 34, 7)
Run Code Online (Sandbox Code Playgroud)

我需要对此进行转换,以便在遇到负面元素时,该负面元素与后续元素相加,直到将该元素转为正数的元素:

outVector <- c(2, 0, 0, 29, 7)
Run Code Online (Sandbox Code Playgroud)

负元素将被设为零,以便保留总和.所以元素2和3将为零,第四个元素等于29 = -10 + 5 + 34.我尝试了这样的for循环解决方案:

outVector <- numeric(length = length(inVector))

for(i in 1:length(inVector)) {
   outVector <- inVector
   outVector[i] <- ifelse(outVector[i] < 0, 0, outVector[i])
   outVector[i + 1] <- ifelse(outVector[i] == 0, sum(inVector[i:(i+1)]), outVector[i + 1])
   outVector <- outVector[1:length(inVector)]
   }
Run Code Online (Sandbox Code Playgroud)

但那没用.但是,我最感兴趣的是一个在dplyr管道中工作的解决方案.

Pie*_*une 6

如果我们想要优化,我们可以使用更有效的Reduce函数来迭代向量:

#Help function
zeroElement <- function(vec) {
  r <- Reduce(function(x,y) if(x >= 0) y else sum(x,y), vec, acc=TRUE)
  r[r < 0] <- 0
  return(r)
}

#Use function
zeroElement(x)
#[1]  2  0  0 29  7
Run Code Online (Sandbox Code Playgroud)

速度测试:快25%:

t3 <- MakeNonNeg(BigVec)
t4 <- zeroElement(BigVec)
all.equal(t3, t4)
#[1] TRUE
library(microbenchmark)
microbenchmark(
  makeNonNeg = MakeNonNeg(BigVec),
  zeroElement = zeroElement(BigVec),
  times=10)
# Unit: seconds
#        expr      min       lq     mean   median       uq      max neval cld
#  makeNonNeg 2.047484 2.099289 2.195988 2.111135 2.248381 2.531009    10   b
# zeroElement 1.529257 1.580789 1.666000 1.664855 1.725528 1.837825    10  a
Run Code Online (Sandbox Code Playgroud)

添加会话信息以进行比较:

sessionInfo()
R version 3.3.0 (2016-05-03)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Run Code Online (Sandbox Code Playgroud)


Jos*_*ood 5

试试这个:

MakeNonNeg <- function(v) {
    size <- length(v)
    myOut <- as.numeric(v)
    if (size > 1L) {
        for (i in 1:(size-1L)) {
            if (myOut[i] >= 0) {next}
            myOut[i+1L] <- myOut[i]+myOut[i+1L]
            myOut[i] <- 0
        }
    }
    myOut
}

MakeNonNeg(inVector)
[1]  2  0  0 29  7
Run Code Online (Sandbox Code Playgroud)

下面是一个更奇特的例子:

set.seed(4242)

BigVec <- sample(-40000:100000, 100000, replace = TRUE)
gmp::sum.bigz(BigVec)
Big Integer ('bigz') :
    [1] 2997861106

t3 <- MakeNonNeg(BigVec)
gmp::sum.bigz(t3)
Big Integer ('bigz') :
    [1] 2997861106

BigVec[1:20]
[1]  98056   8680  -7814  53620  58390  90832  74970 -16392  52648  83779 -17229  38484 -36589  75156  71200  95968 -11599  57705
[19]  19209 -21596

t3[1:20]
[1] 98056  8680     0 45806 58390 90832 74970     0 36256 83779     0 21255     0 38567 71200 95968     0 46106 19209     0
Run Code Online (Sandbox Code Playgroud)

这是我的系统信息:

sessionInfo()
R version 3.3.0 (2016-05-03)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
Run Code Online (Sandbox Code Playgroud)

以下是禁用JIT的两个函数的时序.

microbenchmark(
    makeNonNeg = MakeNonNeg(BigVec),
    zeroElement = zeroElement(BigVec),
    times=10)
Unit: milliseconds
       expr      min       lq     mean   median       uq      max neval
 makeNonNeg 254.1255 255.8430 267.9527 258.6369 277.0222 303.6516    10
zeroElement 152.0358 164.7988 175.3191 166.4948 198.3855 209.8739    10
Run Code Online (Sandbox Code Playgroud)

JIT启用,我们得到了很多不同的结果makeNonNeg.但是,结果zeroElement不会改变那么多(我认为因为Reduce它是函数的主要部分,并且它已经是字节编码,所以没有太大的改进空间).

library(compiler)
enableJIT(3)
[1] 0

microbenchmark(
    makeNonNeg = MakeNonNeg(BigVec),
    zeroElement = zeroElement(BigVec),
    times=10)
Unit: milliseconds
       expr       min        lq      mean    median        uq       max neval
 makeNonNeg  11.20514  11.55366  12.76953  11.84655  12.20554  20.60036    10
zeroElement 144.15123 149.33591 163.66421 157.34711 176.20139 198.57268    10
Run Code Online (Sandbox Code Playgroud)

因此,对于JIT禁用,zeroElement速度提高约50%,JIT启用时MakeNonNeg速度提高约13倍.

  • @PierreLafortune,是的,这太棒了!该软件包中有许多有用的功能可以真正帮助您提高性能.它似乎最适用于非常基本的结构,例如`for`循环和`while`循环.看看这些网站:[FasteR!更高!StrongeR!](http://www.noamross.net/blog/2013/4/25/faster-talk.html)和[加速你的R代码...](https://www.r-statistics. COM/2012/04 /加速,你的-R-代码使用-A-刚刚在时间JIT编译/). (2认同)