myc*_*iza 8 random algorithm r constraints matrix
我正在尝试在 R 中创建一个随机矩阵。它需要是一个存在/不存在矩阵,以便矩阵中的所有值都为 0 或 1。
但我还需要指定行和列总计,例如 5x5 表,其中
我希望使用r2dtable()
,但我不认为你可以强制这个函数只使用0 和 1。我尝试过使用r2dtable()
但最终得到的值高于 1。有什么想法吗?
Ben*_*ker 12
This is a classic (and difficult) problem in community ecology. The picante
package has a Monte Carlo algorithm based on Miklos & Podani (2004) that randomizes a binary matrix while preserving the margins. However, that algorithm assumes that you start with a binary matrix preserving the constraints; it then provides as many randomized matrices with those constraints as you like.
I couldn\'t think of an easy algorithm to generate a binary matrix (even non-random) with the constraints satisfied to use as a starting value, so I used brute force - I used r2dtable()
to generate lots of matrices and figured that a few of them would be binary matrices.
Here we go ...
\nrt <- c(4, 2, 3, 5, 3)\nct <- c(5, 1, 5, 2, 4)\nset.seed(101)\nsystem.time(tt <- r2dtable(100000, rt, ct)) ## 0.06 seconds\nw <- which(sapply(tt, max) == 1)\nlength(w) ## 36/100000 (0.036%) are binary\nm0 <- tt[[w[1]]] ## pick the first one\n
Run Code Online (Sandbox Code Playgroud)\nIf you don\'t care about efficiency you could stop there. If you need thousands of matrices satisfying the conditions, though, the second stage is better ...
\nMiklos and Podani\'s algorithm does \'trial swaps\' (exchanging row/column pairs that preserve the row/column totals); by default, it does 1000 swaps to randomize a matrix.
\nlibrary(picante)\nresults <- list(m0)\nnm <- 100000\npb <- txtProgressBar(max = nm, style = 3)\nsystem.time(\n for (i in 1:nm) {\n setTxtProgressBar(pb, i)\n results[[i+1]] <- randomizeMatrix(results[[i]], null.model = "trialswap")\n }\n)\n
Run Code Online (Sandbox Code Playgroud)\nThis generates 10^5 matrices in 7 seconds on my machine (which would take about 20 times longer via r2dtable()
, by my calculation).
以这种方式生成的所有 10^5 矩阵,以及所有r2dtable
满足约束的矩阵(只有 36 个)都是相同的!(我们通过将整个矩阵折叠成一个二进制字符串来进行比较......)有可能只有一个矩阵满足这些约束,或者可能在一个很大的空间中有一个非常小的数字,所以它是很难从一个到另一个......
table(sapply(results, paste, collapse = ""))\n## 1111100010111111001010111 \n## 100001 \ntable(sapply(tt[w], paste, collapse = ""))\n## 1111100010111111001010111 \n## 36\n
Run Code Online (Sandbox Code Playgroud)\nr2dtable()
hack, however, will scale terribly. If your real problem is much bigger than 5x5 it will be impractical. (Simulated annealing or a genetic algorithm might work to find an initial matrix \xe2\x80\x94 start with a random matrix satisfying the row constraints and shuffle with an objective function equal to the squared deviation from the column constraints \xe2\x80\x94 but I didn\'t bother to spend the time figuring it out.)Miklos I. & Podani J. 2004。存在-不存在矩阵的随机化:评论和新算法。生态学 85:86-92。
\nTho*_*ing 11
igraph
解决方案这是一个选项igraph
,它创建一个随机二分图,其中rsum
和csum
作为各方的度分布,我相信它比版本更有效和优雅CVXR
,并且它也提供了随机性。
library(igraph)
rsum <- c(4, 2, 3, 5, 3)
csum <- c(5, 1, 5, 2, 4)
mat <- sample_degseq(
c(rsum, 0 * csum),
c(0 * rsum, csum),
method = "simple.no.multiple"
) %>%
set_vertex_attr(name = "type", value = V(.) > length(rsum)) %>%
get.incidence() %>%
unname()
Run Code Online (Sandbox Code Playgroud)
我们得到
> mat
[,1] [,2] [,3] [,4] [,5]
[1,] 1 0 1 1 1
[2,] 1 0 1 0 0
[3,] 1 0 1 0 1
[4,] 1 1 1 1 1
[5,] 1 0 1 0 1
Run Code Online (Sandbox Code Playgroud)
CVXR
解决方案我想您可以将问题转化为具有行/列总和约束的优化问题,其中包CVXR
可能是一种选择。
然而,这种方法的一个缺点是生成的矩阵不是随机的,这意味着每当运行下面的代码时,您总是会获得相同的结果。
代码示例
library(CVXR)
# Define the size of the matrix and row/column sums
n <- 5 # size of the matrix
rsum <- c(4, 2, 3, 5, 3) # row sums
csum <- c(5, 1, 5, 2, 4) # column sums
# Create binary variables for each cell in the matrix
X <- Variable(n, n, boolean = TRUE)
# Define the objective function (minimize 0)
obj <- Minimize(0)
# Define the row sum constraints
constrns <- list()
# Define the row and column sum constraints
for (i in 1:n) {
constrns[[length(constrns)+1]] <- sum(X[i, ]) == rsum[i]
}
for (j in 1:n) {
constrns[[length(constrns) + 1]] <- sum(X[, j]) == csum[j]
}
# Create the problem instance
problem <- Problem(obj, constrns)
# Solve the problem
result <- solve(problem)
mat <- round(result$getValue(X))
Run Code Online (Sandbox Code Playgroud)
你会得到一个mat
像这样的矩阵
> mat
[,1] [,2] [,3] [,4] [,5]
[1,] 1 0 1 1 1
[2,] 1 0 1 0 0
[3,] 1 0 1 0 1
[4,] 1 1 1 1 1
[5,] 1 0 1 0 1
Run Code Online (Sandbox Code Playgroud)