如何修改我的代码以提高处理速度

Gui*_*cob 11 r

我必须在大矩阵中的列之间运行类似的代码.

set.seed(1)

my_vector <- runif( 10000 )

my_sums <- NULL

for ( l in 1:length( my_vector ) ) {

  current_result <- my_vector[ my_vector < runif( 1 ) ]

  my_sums[l] <- sum( current_result )

}

head(my_sums)
# [1]   21.45613 2248.31463 2650.46104   62.82708   11.11391   86.21950
Run Code Online (Sandbox Code Playgroud)

Sys.time 结果:

   user  system elapsed 
   1.14    0.00    1.14
Run Code Online (Sandbox Code Playgroud)

关于如何提高绩效的任何想法?

Kha*_*haa 19

Matt Dowle在基地的优秀数据表R

system.time({
  set.seed(1)
  my_vector <- runif(10000)
  x <- runif(10000)
  sorted <- sort(my_vector)
  ind <- findInterval(x, sorted) + 1
  my_sums <- c(0, cumsum(sorted))[ind]
})

#   user  system elapsed 
#      0       0       0 

head(my_sums)
#[1]   21.45613 2248.31463 2650.46104   62.82708   11.11391   86.21950
Run Code Online (Sandbox Code Playgroud)

  • +1超快速.但是,您应该纠正错误.'x`小于'min(my_vector)`的值将返回索引0.子集`cumsum(已排序)[ind = 0]`将为这些值生成`NULL`,从而从`my_sums`向量中删除它们,因此比'my_vector`短.解决方案是使用`my_sums < - c(0,cumsum(sorted))[ind + 1]` (5认同)
  • 为什么要编辑`0`到`-Inf`?当@ x <min(my_vector)`返回"0"时,@ dww的注释解决方案是正确的.但是现在它会为那些人返回`-Inf`. (2认同)

Mat*_*wle 14

require(data.table)

system.time({
  set.seed(1)
  my_vector = runif(10000)
  DT = data.table(my_vector)
  setkey(DT, my_vector)
  DT[,cumsum:=cumsum(my_vector)]
  my_sums = DT[.(runif(10000)), cumsum, roll=TRUE]
  my_sums[is.na(my_sums)] = 0
})

head(my_sums)
# [1]   21.45613 2248.31463 2650.46104   62.82708   11.11391   86.21950

#   user  system elapsed 
#  0.004   0.000   0.004
Run Code Online (Sandbox Code Playgroud)


J_F*_*J_F 1

关于什么sapply

temp <- sapply(seq_along(my_vector), function(l){

  current_result <- my_vector[ my_vector < runif( 1 ) ]
  my_sums[l] <- sum( current_result )

})
Run Code Online (Sandbox Code Playgroud)

这会带来一些性能改进吗?