Lau*_*ura 7 r vector probability
您好,并提前感谢您的帮助!
我试图生成一个具有特定数量的值的向量,这些值是根据概率分布分配的.例如,我想要一个长度为31的向量,包含26个零和5个.(向量的总和应该总是五.)但是,这些的位置很重要.为了确定哪个值应该是1,哪个值应该为零,我有一个概率向量(长度为31),如下所示:
probs<-c(0.01,0.02,0.01,0.02,0.01,0.01,0.01,0.04,0.01,0.01,0.12,0.01,0.02,0.01,
0.14,0.06,0.01,0.01,0.01,0.01,0.01,0.14,0.01,0.07,0.01,0.01,0.04,0.08,0.01,0.02,0.01)
Run Code Online (Sandbox Code Playgroud)
我可以根据这个分布选择值,并使用rbinom获得长度为31的向量,但我不能精确选择五个值.
Inv=rbinom(length(probs),1,probs)
Inv
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0
Run Code Online (Sandbox Code Playgroud)
有任何想法吗?
再次感谢!
Jam*_*mes 10
如何使用加权sample.int来选择位置?
Inv<-integer(31)
Inv[sample.int(31,5,prob=probs)]<-1
Inv
[1] 0 0 0 1 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
Run Code Online (Sandbox Code Playgroud)
Chase提供了一个很好的答案,并提到了失控while()迭代的问题.一个与失控的问题while()是,如果你在一个时间这一项试验中,它需要很多,说牛逼,试验,以找到一个目标号码匹配1S,也导致了开销牛逼调用到主要功能,rbinom()在这种情况下.
还有一条出路,但是,因为rbinom(),像所有的R这些(伪)随机数发生器,被矢量化,我们可以生成米在时间试验和检查这些米一致性试验,以5要求1秒.如果没有找到,我们反复绘制m个试验,直到找到符合要求的试验.这个想法在foo()下面的函数中实现.该chunkSize参数是米,试验次数为一次绘制.我也借此机会允许该功能找到一个以上的共形试验; 参数n控制返回多少个共形试验.
foo <- function(probs, target, n = 1, chunkSize = 100) {
len <- length(probs)
out <- matrix(ncol = len, nrow = 0) ## return object
## draw chunkSize trials
trial <- matrix(rbinom(len * chunkSize, 1, probs),
ncol = len, byrow = TRUE)
rs <- rowSums(trial) ## How manys `1`s
ok <- which(rs == 5L) ## which meet the `target`
found <- length(ok) ## how many meet the target
if(found > 0) ## if we found some, add them to out
out <- rbind(out,
trial[ok, , drop = FALSE][seq_len(min(n,found)), ,
drop = FALSE])
## if we haven't found enough, repeat the whole thing until we do
while(found < n) {
trial <- matrix(rbinom(len * chunkSize, 1, probs),
ncol = len, byrow = TRUE)
rs <- rowSums(trial)
ok <- which(rs == 5L)
New <- length(ok)
if(New > 0) {
found <- found + New
out <- rbind(out, trial[ok, , drop = FALSE][seq_len(min(n, New)), ,
drop = FALSE])
}
}
if(n == 1L) ## comment this, and
out <- drop(out) ## this if you don't want dimension dropping
out
}
Run Code Online (Sandbox Code Playgroud)
它的工作原理如下:
> set.seed(1)
> foo(probs, target = 5)
[1] 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0
[31] 0
> foo(probs, target = 5, n = 2)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
[1,] 0 0 0 0 0 0 0 0 0 0 0
[2,] 0 0 0 0 0 0 0 0 0 0 1
[,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21]
[1,] 0 0 0 1 1 0 0 0 0 0
[2,] 0 1 0 0 1 0 0 0 0 0
[,22] [,23] [,24] [,25] [,26] [,27] [,28] [,29] [,30] [,31]
[1,] 1 0 1 0 0 0 1 0 0 0
[2,] 1 0 1 0 0 0 0 0 0 0
Run Code Online (Sandbox Code Playgroud)
请注意,我删除了空的维度n == 1.if如果您不想要此功能,请注释最后一个代码块.
您需要平衡大小chunkSize与一次检查多个试验的计算负担.如果要求(此处为5 1秒)不太可能,那么增加,chunkSize这样您就可以减少拨打电话rbinom().如果可能的要求,chunkSize那么只需要一个或两个,因为你必须评估每个试验抽奖时,几乎没有点抽签试验和大规模试验.
我想你想用一组给定概率从二项分布中重新取样,直到达到你的目标值5,是吗?如果是这样,那么我认为这样做你想要的.甲while环可用于迭代,直到满足条件.如果您提供非常不切实际的概率和目标值,我猜它可能会变成一个失控的函数,所以请考虑自己警告:)
FOO <- function(probs, target) {
out <- rbinom(length(probs), 1, probs)
while (sum(out) != target) {
out <- rbinom(length(probs), 1, probs)
}
return(out)
}
Run Code Online (Sandbox Code Playgroud)
FOO(probs,target = 5)
> FOO(probs, target = 5)
[1] 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 1 0
Run Code Online (Sandbox Code Playgroud)