相关疑难解决方法(0)

加速采样更快,无需更换

这个问题导致了一个新的R包: wrswoR

没有替换使用的R的默认采样sample.int似乎需要二次运行时间,例如,当使用从均匀分布中提取的权重时.这对于大样本量来说很慢.有人知道从R中可以使用的更快的实现吗？两个选项是"替换拒绝采样"(参见stats.sx上的这个问题)和Wong和Easton(1980)的算法(在StackOverflow答案中使用Python实现).

感谢Ben Bolker暗示C函数,该函数在被调用时sample.int具有内部调用replace=F和非均匀权重:ProbSampleNoReplace.实际上,代码显示了两个嵌套for循环(第420行random.c).

以下是根据经验分析运行时间的代码:

library(plyr)

sample.int.test <- function(n, p) {
    sample.int(2 * n, n, replace=F, prob=p); NULL }

times <- ldply(
  1:7,
  function(i) {
    n <- 1024 * (2 ** i)
    p <- runif(2 * n)
    data.frame(
      n=n,
      user=system.time(sample.int.test(n, p), gcFirst=T)['user.self'])
  },
  .progress='text'
)

times

library(ggplot2)
ggplot(times, aes(x=n, y=user/n)) + geom_point() + scale_x_log10() +
  ylab('Time per unit (s)') …

Run Code Online (Sandbox Code Playgroud)

algorithm performance r

krl*_*mlr

2017 05-23

48
推荐指数

2
解决办法

5619
查看次数

标签统计

algorithm ×1

performance ×1

r ×1

加速采样更快,无需更换

标签 统计

标签统计