And*_*d_R 2 performance multicore r sample weighted
如何加快R中的概率加权采样?
# Let's assume we are considering following example:
w <- sample(1:4000,size=2e6, replace=T)
# "w" will be integer, so we are going to convert it to numeric.
w <- as.numeric(w)
# Actually the sampling process have to be repeated many times.
M <- matrix(NA, 10, 2000)
system.time(
for (r in 1:10){
ix <- sample(1:2e6,size=2000,prob=w/sum(w))
M[r,] <- ix
})
# It's worth it to mention that without "prob=w/sum(w)" sampling is considerably faster.
# The main goal is to speed up sampling with probability weights!
system.time(ix <- sample(1:2e6,size=2000,prob=w/sum(w)))
Run Code Online (Sandbox Code Playgroud)
加权采样需要9.84秒,正常采样0.01秒.如果您对如何加速加权采样有所了解,请随时回答.
速度问题仅限于无需更换的加权采样.这是你的代码,移动与sample循环外部无关的部分.
normalized_weights <- w/sum(w)
#No weights
system.time(
for (r in 1:10){
ix <- sample(2e6, size = 2000)
})
#Weighted, no replacement
system.time(
for (r in 1:10){
ix <- sample(2e6, size = 2000, prob = normalized_weights)
})
#Weighted with replacement
system.time(
for (r in 1:10){
ix <- sample(2e6, size = 2000, replace = TRUE, prob = normalized_weights)
})
Run Code Online (Sandbox Code Playgroud)
最大的问题是,当您在没有替换的情况下进行加权采样时,每次选择一个值时,都需要重新计算权重.见?sample:
如果'replace'为假,则按顺序应用这些概率,即选择下一项的概率与剩余项之间的权重成比例.
可能存在比使用更快的解决方案sample(我不知道它的优化程度如何),但它比未加权/加权替换采样在计算上更加集中.
| 归档时间: |
|
| 查看次数: |
854 次 |
| 最近记录: |