Sté*_*ent 11 random algorithm r
使用 R,如何生成位于和 之间的n
随机数x_1
, ...,其总和为?x_n
-1
1
0
推广到另一个总和和另一个范围怎么样?
Sté*_*ent 11
这是第一个问题的简单解决方案。模拟u_1
, ...,和u_n
之间。然后设置、、、...、。-1
1
x_1 = (u_1-u_n)/2
x_2 = (u_2-u_1)/2
x_3 = (u_3-u_2)/2
x_n = (u_n-u_{n-1})/2
n <- 10L
u <- runif(n, -1, 1)
x <- c(u[1L]-u[n], diff(u)) / 2
sum(x)
summary(x)
Run Code Online (Sandbox Code Playgroud)
n
随机数x_1
, ...,以及给定数字(在 和 之间)的总和?x_n
-1
1
s
-n
n
像以前一样继续,但减去s/n
并u_i
添加s/n
到x_i
:
s <- 3
n <- 10L
u <- runif(n, -1, 1) - s/n
x <- c(u[1L]-u[n], diff(u))/2 + s/n
sum(x)
summary(x)
Run Code Online (Sandbox Code Playgroud)
n
随机数x_1
,并且该数字之和为给定数字(在和之间)?x_n
a
b
s
n*a
n*b
前面的方法可以推广到以下情况a = -b
:
a <- -4; b <- 4; s <- 10
n <- 10L
u <- runif(n, -(b-a)/2, (b-a)/2) - s/n
x <- c(u[1L]-u[n], diff(u))/2 + s/n
sum(x)
summary(x)
Run Code Online (Sandbox Code Playgroud)
狄利克雷重缩放算法解决了任意数字对的最后一个问题a < b
。
此外,模拟向量在维流形上(x_1, ..., x_n)
具有均匀分布。(n-1)
{sum x_i = s}
a <= x_i <= b
这是一个 R 实现,改编自 Roger Stafford 编写的 Matlab 实现(代码中给出的参考)。
# adapted from Roger Stafford's Matlab implementation
# Roger Stafford (2023). Random Vectors with Fixed Sum (https://www.mathworks.com/matlabcentral/fileexchange/9700-random-vectors-with-fixed-sum), MATLAB Central File Exchange. Retrieved March 23, 2023.
DRS <- function(n, s, a, b) {
n <- as.integer(n)
if(s < n*a || s > n*b || a >= b) {
stop("Invalid parameters.")
}
# Rescale to a unit cube: 0 <= x(i) <= 1
s <- (s - n*a) / (b - a)
# Construct the transition probability table, t.
# t(i,j) will be used only in the region where j <= i + 1.
k <- max(min(as.integer(floor(s)), n-1L), 0L) # Must have 0 <= k <= n-1
s <- max(min(s, k+1L), k) # Must have k <= s <= k+1
s1 <- s - (k:(k-n+1L)) # s1 will never be negative
s2 <- ((k+n):(k+1L)) - s # s2 will never be negative
w <- matrix(0, nrow = n, ncol = n+1L)
w[1L, 2L] <- .Machine$double.xmax # Scale for full 'double' range
t <- matrix(0, nrow = n-1L, ncol = n)
tiny <- .Machine$double.eps # The smallest positive 'double'
for(i in 2L:n) {
tmp1 <- w[i-1L, 2L:(i+1L)] * s1[1L:i] / i
tmp2 <- w[i-1L, 1L:i] * s2[(n-i+1L):n] / i
w[i, 2L:(i+1L)] <- tmp1 + tmp2
tmp3 <- w[i, 2L:(i+1L)] + tiny # In case tmp1 & tmp2 are both 0,
tmp4 <- as.double(s2[(n-i+1L):n] > s1[1L:i]) # then t is 0 on left & 1 on right
t[i-1L, 1L:i] <- (tmp2 / tmp3) * tmp4 + (1 - tmp1 / tmp3) * (1 - tmp4)
}
# Derive the polytope volume v from the appropriate element in the bottom row of w.
v <- n^(3/2) * (w[n, k+2L] / .Machine$double.xmax) * (b - a)^(n - 1L)
# Now construct the vector x.
x <- numeric(n)
rt <- runif(n - 1L) # For random selection of simplex type
rs <- runif(n - 1L) # For random location within a simplex
j <- k + 1L # For indexing in the t table
sm <- 0 # Start with sum zero
pr <- 1 # Start with product 1
for(i in (n-1L):1L) { # Work backwards in the t table
e <- as.double(rt[n-i] <= t[i, j]) # Use rt to choose a transition
sx <- rs[n-i] ^ (1L/i) # Use rs to compute next simplex coordinate
sm <- sm + (1 - sx) * pr * s / (i + 1L) # Update sum
pr <- sx * pr # Update product
x[n-i] <- sm + pr * e # Calculate x using simplex coordinates
s <- s - e
j <- j - e # Transition adjustment
}
x[n] <- sm + pr * s # Compute the last x
# Randomly permute the order in x and rescale
p <- order(runif(n)) # it is a random permutation
a + (b - a) * x[p]
}
Run Code Online (Sandbox Code Playgroud)
例子:
x <- DRS(n = 10L, s = 14, a = 1, b = 2)
sum(x)
summary(x)
Run Code Online (Sandbox Code Playgroud)
出了问题:这不是DRS - 请参阅给定链接中的幻灯片。DRS 更普遍地允许x_i
.
我刚刚发现这个方法是在 R 包Surrogate中实现的。
首先,我想说,下面的方法只是约束随机性的可能实现,但对于均匀分布属性或类似的属性来说并不安全。
遵循递归思想,我们可以r
首先生成更多的边界约束,这可能会加速很多并且效率更高
f <- function(s, n, a, b) {
if (s < n * a || s > n * b) {
stop("Invalid parameters.")
}
if (n == 1) {
return(s)
}
r <- runif(1, max(a, s - (n - 1) * b), min(b, s - (n - 1) * a))
c(r, Recall(s - r, n - 1, a, b))
}
Run Code Online (Sandbox Code Playgroud)
我们可以看到
> (v <- f(s = 60, n = 30, a = 1, b = 3))
[1] 1.544962 1.229845 2.013064 1.510149 2.933672 1.782947 1.650229 2.700521
[9] 1.151468 1.758759 2.035019 1.355591 2.731922 2.918394 2.288166 2.198345
[17] 1.313646 2.312720 1.232810 1.591426 1.020105 2.788073 1.208734 2.929171
[25] 1.397976 2.044319 1.593190 2.961647 2.849886 2.953244
> summary(v)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.002 1.387 2.126 2.000 2.585 2.892
> length(v)
[1] 30
> sum(v)
[1] 60
Run Code Online (Sandbox Code Playgroud)
一种强力(低效)但懒惰的方法是拒绝采样
f <- function(s, n, a, b) {
repeat {
v <- runif(n, a, b)
x <- s * v / sum(v)
if (all(x >= a & x <= b)) break
}
x
}
Run Code Online (Sandbox Code Playgroud)
这样
> (v <- f(s = 60, n = 30, a = 1, b = 3))
[1] 1.800257 1.706306 2.855300 2.177379 2.844279 2.293842 1.011653 2.820371
[9] 2.803390 2.427355 1.892209 1.829180 2.240873 1.641185 2.267275 1.899986
[17] 1.042455 1.400519 2.612722 1.018635 2.024762 1.413173 1.376111 2.685723
[25] 1.886224 2.151509 1.598368 1.114850 2.303480 2.860629
> summary(v)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.012 1.609 1.962 2.000 2.396 2.861
> length(v)
[1] 30
> sum(v)
[1] 60
Run Code Online (Sandbox Code Playgroud)
Roger Stafford\xe2\x80\x99s 算法不处理不同的边界,因为\n巧妙地利用相同边界情况的对称性在最坏的情况下\n有效地采样。
\n狄利克雷重缩放 (DRS) 算法可以处理不同的边界,但本质上是通过一种技巧来生成和测试(在最坏的情况下效率不高)。诀窍是,当我们\xe2\x80\x99重新采样\nx i \xe2\x88\x88 [a i , b i ] 时,\n\xe2\x88\x91x i = s,我们可以
\n如果 \xe2\x88\x91(a i + b i )/2 < s,则第一个选项更好,\n如果 \xe2\x88\x91(a i + b i )/2 >\ns,则第二个选项更好。(最初的 DRS 演示要复杂得多。)
\nDRS 的另一个特点是它努力不消耗\n超出其需要的随机性,但在我看来这是一个错误,因为\n多次迭代样本可能会出现奇怪的浮点效应,\n而随机数只是\xe2\x80\x99t 那么贵。
\n在(测试不充分的)R 中:
\nDRS_upper <- function(s, u) {\n if (length(u) == 0 || any(u <= 0)) {\n stop("Invalid parameters.")\n }\n repeat {\n x <- diff(c(0, sort(runif(length(u) - 1, 0, s)), s))\n if (all(x <= u)) {\n return(x)\n }\n }\n}\n\nDRS <- function(s, a, b) {\n if (length(a) == 0 || length(b) == 0 || length(a) != length(b) || any(a >= b) ||\n s <= sum(a) || sum(b) <= s) {\n stop("Invalid parameters.")\n }\n if (s - sum(a) < sum(b) - s) {\n return(DRS_upper(s - sum(a), b - a) + a)\n }\n return(b - DRS_upper(sum(b) - s, b - a))\n}\n\n# Example\nDRS(10, c(1, 2, 3), c(3, 4, 5))\n
Run Code Online (Sandbox Code Playgroud)\n