-1 到 1 之间的随机数总和为 0

Sté*_*ent 11 random algorithm r

使用 R,如何生成位于和 之间的n随机数x_1, ...,其总和为?x_n-110

推广到另一个总和和另一个范围怎么样?

Sté*_*ent 11

简单解决第一个问题

这是第一个问题的简单解决方案。模拟u_1, ...,和u_n之间。然后设置、、、...、。-11x_1 = (u_1-u_n)/2x_2 = (u_2-u_1)/2x_3 = (u_3-u_2)/2x_n = (u_n-u_{n-1})/2

n <- 10L
u <- runif(n, -1, 1)
x <- c(u[1L]-u[n], diff(u)) / 2
sum(x)
summary(x)
Run Code Online (Sandbox Code Playgroud)

推广到任意总和

现在,如何生成位于和 之间的n随机数x_1, ...,以及给定数字(在 和 之间)的总和?x_n-11s-nn

像以前一样继续,但减去s/nu_i添加s/nx_i

s <- 3
n <- 10L
u <- runif(n, -1, 1) - s/n
x <- c(u[1L]-u[n], diff(u))/2 + s/n
sum(x)
summary(x)
Run Code Online (Sandbox Code Playgroud)

泛化到任意范围?

现在,如何生成位于两个给定数字和之间的n随机数x_1,并且该数字之和为给定数字(在和之间)?x_nabsn*an*b

前面的方法可以推广到以下情况a = -b

a <- -4; b <- 4; s <- 10
n <- 10L
u <- runif(n, -(b-a)/2, (b-a)/2) - s/n
x <- c(u[1L]-u[n], diff(u))/2 + s/n
sum(x)
summary(x)
Run Code Online (Sandbox Code Playgroud)

最后是DRS算法

狄利克雷重缩放算法解决了任意数字对的最后一个问题a < b

此外,模拟向量在维流形上(x_1, ..., x_n)具有均匀分布(n-1){sum x_i = s}a <= x_i <= b

这是一个 R 实现,改编自 Roger Stafford 编写的 Matlab 实现(代码中给出的参考)。

# adapted from Roger Stafford's Matlab implementation
# Roger Stafford (2023). Random Vectors with Fixed Sum (https://www.mathworks.com/matlabcentral/fileexchange/9700-random-vectors-with-fixed-sum), MATLAB Central File Exchange. Retrieved March 23, 2023.
DRS <- function(n, s, a, b) {
  n <- as.integer(n)
  if(s < n*a || s > n*b || a >= b) {
    stop("Invalid parameters.")
  }
  # Rescale to a unit cube: 0 <= x(i) <= 1
  s <- (s - n*a) / (b - a)
  # Construct the transition probability table, t.
  #   t(i,j) will be used only in the region where j <= i + 1.
  k <- max(min(as.integer(floor(s)), n-1L), 0L) # Must have 0 <= k <= n-1
  s <- max(min(s, k+1L), k)                     # Must have k <= s <= k+1
  s1 <- s - (k:(k-n+1L))   # s1 will never be negative
  s2 <- ((k+n):(k+1L)) - s # s2 will never be negative
  w <- matrix(0, nrow = n, ncol = n+1L)
  w[1L, 2L] <- .Machine$double.xmax # Scale for full 'double' range
  t <- matrix(0, nrow = n-1L, ncol = n)
  tiny <- .Machine$double.eps # The smallest positive 'double' 
  for(i in 2L:n) {
    tmp1 <- w[i-1L, 2L:(i+1L)] * s1[1L:i] / i
    tmp2 <- w[i-1L, 1L:i] * s2[(n-i+1L):n] / i
    w[i, 2L:(i+1L)] <- tmp1 + tmp2
    tmp3 <- w[i, 2L:(i+1L)] + tiny               # In case tmp1 & tmp2 are both 0,
    tmp4 <- as.double(s2[(n-i+1L):n] > s1[1L:i]) #   then t is 0 on left & 1 on right
    t[i-1L, 1L:i] <- (tmp2 / tmp3) * tmp4 + (1 - tmp1 / tmp3) * (1 - tmp4)
  }
  # Derive the polytope volume v from the appropriate element in the bottom row of w.
  v <- n^(3/2) * (w[n, k+2L] / .Machine$double.xmax) * (b - a)^(n - 1L)
  # Now construct the vector x.
  x <- numeric(n)
  rt <- runif(n - 1L) # For random selection of simplex type
  rs <- runif(n - 1L) # For random location within a simplex
  j <- k + 1L # For indexing in the t table
  sm <- 0 # Start with sum zero
  pr <- 1 # Start with product 1
  for(i in (n-1L):1L) { # Work backwards in the t table
    e <- as.double(rt[n-i] <= t[i, j]) # Use rt to choose a transition
    sx <- rs[n-i] ^ (1L/i)             # Use rs to compute next simplex coordinate
    sm <- sm + (1 - sx) * pr * s / (i + 1L) # Update sum
    pr <- sx * pr                           # Update product
    x[n-i] <- sm + pr * e # Calculate x using simplex coordinates
    s <- s - e
    j <- j - e # Transition adjustment
  }
  x[n] <- sm + pr * s # Compute the last x
  # Randomly permute the order in x and rescale
  p <- order(runif(n)) # it is a random permutation
  a + (b - a) * x[p]
}
Run Code Online (Sandbox Code Playgroud)

例子:

x <- DRS(n = 10L, s = 14, a = 1, b = 2)
sum(x)
summary(x)
Run Code Online (Sandbox Code Playgroud)

编辑

出了问题:这不是DRS - 请参阅给定链接中的幻灯片。DRS 更普遍地允许x_i.


编辑

我刚刚发现这个方法是在 R 包Surrogate中实现的。

  • 我认为你的解决方案看起来太复杂了——@Thomasiscoding 删除的解决方案的一个版本会更简单。但你对分布的描述是错误的。例如 `x &lt;- DRS(n = 10, s = 10.1, a = 1, b = 2)` 在 `{sum x_i = s}` 上并不统一,有一些上限和下限(它们是范围“[a,b]”,不等于它)。 (3认同)

Tho*_*ing 8

首先,我想说,下面的方法只是约束随机性的可能实现,但对于均匀分布属性或类似的属性来说并不安全。


  • 递归

遵循递归思想,我们可以r首先生成更多的边界约束,这可能会加速很多并且效率更高

f <- function(s, n, a, b) {
  if (s < n * a || s > n * b) {
    stop("Invalid parameters.")
  }
  if (n == 1) {
    return(s)
  }
  r <- runif(1, max(a, s - (n - 1) * b), min(b, s - (n - 1) * a))
  c(r, Recall(s - r, n - 1, a, b))
}
Run Code Online (Sandbox Code Playgroud)

我们可以看到

> (v <- f(s = 60, n = 30, a = 1, b = 3))
 [1] 1.544962 1.229845 2.013064 1.510149 2.933672 1.782947 1.650229 2.700521
 [9] 1.151468 1.758759 2.035019 1.355591 2.731922 2.918394 2.288166 2.198345
[17] 1.313646 2.312720 1.232810 1.591426 1.020105 2.788073 1.208734 2.929171
[25] 1.397976 2.044319 1.593190 2.961647 2.849886 2.953244

> summary(v)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
  1.002   1.387   2.126   2.000   2.585   2.892

> length(v)
[1] 30

> sum(v)
[1] 60
Run Code Online (Sandbox Code Playgroud)
  • 拒绝抽样

一种强力(低效)但懒惰的方法是拒绝采样

f <- function(s, n, a, b) {
  repeat {
    v <- runif(n, a, b)
    x <- s * v / sum(v)
    if (all(x >= a & x <= b)) break
  }
  x
}
Run Code Online (Sandbox Code Playgroud)

这样

> (v <- f(s = 60, n = 30, a = 1, b = 3))
 [1] 1.800257 1.706306 2.855300 2.177379 2.844279 2.293842 1.011653 2.820371
 [9] 2.803390 2.427355 1.892209 1.829180 2.240873 1.641185 2.267275 1.899986
[17] 1.042455 1.400519 2.612722 1.018635 2.024762 1.413173 1.376111 2.685723
[25] 1.886224 2.151509 1.598368 1.114850 2.303480 2.860629

> summary(v)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
  1.012   1.609   1.962   2.000   2.396   2.861

> length(v)
[1] 30

> sum(v)
[1] 60
Run Code Online (Sandbox Code Playgroud)


Dav*_*tat 6

Roger Stafford\xe2\x80\x99s 算法不处理不同的边界,因为\n巧妙地利用相同边界情况的对称性在最坏的情况下\n有效地采样。

\n

狄利克雷重缩放 (DRS) 算法可以处理不同的边界,但本质上是通过一种技巧来生成和测试(在最坏的情况下效率不高)。诀窍是,当我们\xe2\x80\x99重新采样\nx i \xe2\x88\x88 [a i , b i ] 时,\n\xe2\x88\x91x i = s,我们可以

\n
    \n
  • 使用\xe2\x88\ x91x i = s\n 生成 x i \xe2\x88\x92 a i \xe2\x89\xa5 0 并测试 x i \xe2\x89\xa4 b i
  • \n
  • 使用\xe2\x88\ x91x i = s\n生成 b i \xe2\x88\x92 x i \xe2\x89\xa5 0 并测试 a i \xe2\x89\xa4 x i
  • \n
\n

如果 \xe2\x88\x91(a i + b i )/2 < s,则第一个选项更好,\n如果 \xe2\x88\x91(a i + b i )/2 >\ns,则第二个选项更好。(最初的 DRS 演示要复杂得多。)

\n

DRS 的另一个特点是它努力不消耗\n超出其需要的随机性,但在我看来这是一个错误,因为\n多次迭代样本可能会出现奇怪的浮点效应,\n而随机数只是\xe2\x80\x99t 那么贵。

\n

在(测试不充分的)R 中:

\n
DRS_upper <- function(s, u) {\n    if (length(u) == 0 || any(u <= 0)) {\n        stop("Invalid parameters.")\n    }\n    repeat {\n        x <- diff(c(0, sort(runif(length(u) - 1, 0, s)), s))\n        if (all(x <= u)) {\n            return(x)\n        }\n    }\n}\n\nDRS <- function(s, a, b) {\n    if (length(a) == 0 || length(b) == 0 || length(a) != length(b) || any(a >= b) ||\n        s <= sum(a) || sum(b) <= s) {\n        stop("Invalid parameters.")\n    }\n    if (s - sum(a) < sum(b) - s) {\n        return(DRS_upper(s - sum(a), b - a) + a)\n    }\n    return(b - DRS_upper(sum(b) - s, b - a))\n}\n\n# Example\nDRS(10, c(1, 2, 3), c(3, 4, 5))\n
Run Code Online (Sandbox Code Playgroud)\n