Mr.*_*Zen 2 r sample probability apply
我需要在给定具有行方式结果概率的矩阵的情况下对结果变量进行采样.
set.seed(1010) #reproducibility
#create a matrix of probabilities
#three possible outcomes, 10.000 cases
probabilities <- matrix(runif(10000*3),nrow=10000,ncol=3)
probabilities <- probabilities / Matrix::rowSums(probabilities)
Run Code Online (Sandbox Code Playgroud)
我能想出的最快方法是apply()和sample()的组合.
#row-wise sampling using these probabilities
classification <- apply(probabilities, 1, function(x) sample(1:3, 1, prob = x))
Run Code Online (Sandbox Code Playgroud)
但是,在我正在做的事情中,这是计算瓶颈.您是否知道如何加快此代码速度/如何更有效地进行采样?
谢谢!
RLave的意见,即Rcpp可能是要走的路是现货(你还需要RcppArmadillo进行sample()); 我使用以下C++代码来创建这样的函数:
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadilloExtensions/sample.h>
using namespace Rcpp;
// [[Rcpp::export]]
IntegerVector sample_matrix(NumericMatrix x, IntegerVector choice_set) {
int n = x.nrow();
IntegerVector result(n);
for ( int i = 0; i < n; ++i ) {
result[i] = RcppArmadillo::sample(choice_set, 1, false, x(i, _))[0];
}
return result;
}
Run Code Online (Sandbox Code Playgroud)
然后我通过我的R会话使该功能可用
Rcpp::sourceCpp("sample_matrix.cpp")
Run Code Online (Sandbox Code Playgroud)
现在我们可以在R中针对您的初始方法以及其他使用建议进行测试,purrr::map()并且lapply():
set.seed(1010) #reproducibility
#create a matrix of probabilities
#three possible outcomes, 10.000 cases
probabilities <- matrix(runif(10000*3),nrow=10000,ncol=3)
probabilities <- probabilities / Matrix::rowSums(probabilities)
probabilities_list <- split(probabilities, seq(nrow(probabilities)))
library(purrr)
library(microbenchmark)
microbenchmark(
apply = apply(probabilities, 1, function(x) sample(1:3, 1, prob = x)),
map = map(probabilities_list, function(x) sample(1:3, 1, prob = x)),
lapply = lapply(probabilities_list, function(x) sample(1:3, 1, prob = x)),
rcpp = sample_matrix(probabilities, 1:3),
times = 100
)
Unit: milliseconds
expr min lq mean median uq max neval
apply 307.44702 321.30051 339.85403 342.36421 350.86090 434.56007 100
map 254.69721 265.10187 282.85592 286.21680 295.48886 363.95898 100
lapply 249.68224 259.70178 280.63066 279.87273 287.10062 691.21359 100
rcpp 12.16787 12.55429 13.47837 13.81601 14.25198 16.84859 100
cld
c
b
b
a
Run Code Online (Sandbox Code Playgroud)
节省的时间相当可观.