通过概率分布分配特定数量的值(在R中)

Lau*_*ura 7 r vector probability

您好,并提前感谢您的帮助!

我试图生成一个具有特定数量的值的向量,这些值是根据概率分布分配的.例如,我想要一个长度为31的向量,包含26个零和5个.(向量的总和应该总是五.)但是,这些的位置很重要.为了确定哪个值应该是1,哪个值应该为零,我有一个概率向量(长度为31),如下所示:

probs<-c(0.01,0.02,0.01,0.02,0.01,0.01,0.01,0.04,0.01,0.01,0.12,0.01,0.02,0.01,
0.14,0.06,0.01,0.01,0.01,0.01,0.01,0.14,0.01,0.07,0.01,0.01,0.04,0.08,0.01,0.02,0.01)
Run Code Online (Sandbox Code Playgroud)

我可以根据这个分布选择值,并使用rbinom获得长度为31的向量,但我不能精确选择五个值.

Inv=rbinom(length(probs),1,probs)
Inv
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0
Run Code Online (Sandbox Code Playgroud)

有任何想法吗?

再次感谢!

Jam*_*mes 10

如何使用加权sample.int来选择位置?

Inv<-integer(31)
Inv[sample.int(31,5,prob=probs)]<-1
Inv
[1] 0 0 0 1 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
Run Code Online (Sandbox Code Playgroud)


Rei*_*son 7

Chase提供了一个很好的答案,并提到了失控while()迭代的问题.一个与失控的问题while()是,如果你在一个时间这一项试验中,它需要很多,说牛逼,试验,以找到一个目标号码匹配1S,也导致了开销牛逼调用到主要功能,rbinom()在这种情况下.

还有一条出路,但是,因为rbinom(),像所有的R这些(伪)随机数发生器,被矢量化,我们可以生成在时间试验和检查这些一致性试验,以5要求1秒.如果没有找到,我们反复绘制m个试验,直到找到符合要求的试验.这个想法在foo()下面的函数中实现.该chunkSize参数是,试验次数为一次绘制.我也借此机会允许该功能找到一个以上的共形试验; 参数n控制返回多少个共形试验.

foo <- function(probs, target, n = 1, chunkSize = 100) {
    len <- length(probs)
    out <- matrix(ncol = len, nrow = 0) ## return object
    ## draw chunkSize trials
    trial <- matrix(rbinom(len * chunkSize, 1, probs),
                    ncol = len, byrow = TRUE)
    rs <- rowSums(trial)  ## How manys `1`s
    ok <- which(rs == 5L) ## which meet the `target`
    found <- length(ok)   ## how many meet the target
    if(found > 0)         ## if we found some, add them to out
        out <- rbind(out,
                     trial[ok, , drop = FALSE][seq_len(min(n,found)), , 
                                               drop = FALSE])
    ## if we haven't found enough, repeat the whole thing until we do
    while(found < n) {
        trial <- matrix(rbinom(len * chunkSize, 1, probs),
                            ncol = len, byrow = TRUE)
        rs <- rowSums(trial)
        ok <- which(rs == 5L)
        New <- length(ok)
        if(New > 0) {
            found <- found + New
            out <- rbind(out, trial[ok, , drop = FALSE][seq_len(min(n, New)), , 
                                                        drop = FALSE])
        }
    }
    if(n == 1L)           ## comment this, and
        out <- drop(out)  ## this if you don't want dimension dropping
    out
}
Run Code Online (Sandbox Code Playgroud)

它的工作原理如下:

> set.seed(1)
> foo(probs, target = 5)
 [1] 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0
[31] 0
> foo(probs, target = 5, n = 2)
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
[1,]    0    0    0    0    0    0    0    0    0     0     0
[2,]    0    0    0    0    0    0    0    0    0     0     1
     [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21]
[1,]     0     0     0     1     1     0     0     0     0     0
[2,]     0     1     0     0     1     0     0     0     0     0
     [,22] [,23] [,24] [,25] [,26] [,27] [,28] [,29] [,30] [,31]
[1,]     1     0     1     0     0     0     1     0     0     0
[2,]     1     0     1     0     0     0     0     0     0     0
Run Code Online (Sandbox Code Playgroud)

请注意,我删除了空的维度n == 1.if如果您不想要此功能,请注释最后一个代码块.

您需要平衡大小chunkSize与一次检查多个试验的计算负担.如果要求(此处为5 1秒)不太可能,那么增加,chunkSize这样您就可以减少拨打电话rbinom().如果可能的要求,chunkSize那么只需要一个或两个,因为你必须评估每个试验抽奖时,几乎没有点抽签试验和大规模试验.


Cha*_*ase 5

我想你想用一组给定概率从二项分布中重新取样,直到达到你的目标值5,是吗?如果是这样,那么我认为这样做你想要的.甲while环可用于迭代,直到满足条件.如果您提供非常不切实际的概率和目标值,我猜它可能会变成一个失控的函数,所以请考虑自己警告:)

FOO <- function(probs, target) {
  out <- rbinom(length(probs), 1, probs)

  while (sum(out) != target) {

    out <- rbinom(length(probs), 1, probs)
  }
  return(out)
}
Run Code Online (Sandbox Code Playgroud)

FOO(probs,target = 5)

> FOO(probs, target = 5)  
 [1] 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 1 0
Run Code Online (Sandbox Code Playgroud)