制作具有行和列约束的随机存在/不存在矩阵(因此是布尔值)

myc*_*iza 8 random algorithm r constraints matrix

我正在尝试在 R 中创建一个随机矩阵。它需要是一个存在/不存在矩阵,以便矩阵中的所有值都为 0 或 1。

但我还需要指定行和列总计,例如 5x5 表,其中

  • 行总计为 r1 = 4、r2 = 2、r3 = 3、r4 = 5、r5 = 3
  • 列总计为 c1 = 5、c2 = 1、c3 = 5、c4 = 2、c5 = 4

我希望使用r2dtable(),但我不认为你可以强制这个函数只使用0 和 1。我尝试过使用r2dtable()但最终得到的值高于 1。有什么想法吗?

Ben*_*ker 12

This is a classic (and difficult) problem in community ecology. The picante package has a Monte Carlo algorithm based on Miklos & Podani (2004) that randomizes a binary matrix while preserving the margins. However, that algorithm assumes that you start with a binary matrix preserving the constraints; it then provides as many randomized matrices with those constraints as you like.

\n

I couldn\'t think of an easy algorithm to generate a binary matrix (even non-random) with the constraints satisfied to use as a starting value, so I used brute force - I used r2dtable() to generate lots of matrices and figured that a few of them would be binary matrices.

\n

Here we go ...

\n

r2dtable to generate a starting matrix

\n
rt <- c(4, 2, 3, 5, 3)\nct <- c(5, 1, 5, 2, 4)\nset.seed(101)\nsystem.time(tt <- r2dtable(100000, rt, ct))  ## 0.06 seconds\nw <- which(sapply(tt, max) == 1)\nlength(w) ## 36/100000 (0.036%) are binary\nm0 <- tt[[w[1]]]  ## pick the first one\n
Run Code Online (Sandbox Code Playgroud)\n

If you don\'t care about efficiency you could stop there. If you need thousands of matrices satisfying the conditions, though, the second stage is better ...

\n

trial-swap shuffling

\n

Miklos and Podani\'s algorithm does \'trial swaps\' (exchanging row/column pairs that preserve the row/column totals); by default, it does 1000 swaps to randomize a matrix.

\n
library(picante)\nresults <- list(m0)\nnm <- 100000\npb <- txtProgressBar(max = nm, style = 3)\nsystem.time(\n  for (i in 1:nm) {\n   setTxtProgressBar(pb, i)\n   results[[i+1]] <- randomizeMatrix(results[[i]], null.model = "trialswap")\n  }\n)\n
Run Code Online (Sandbox Code Playgroud)\n

This generates 10^5 matrices in 7 seconds on my machine (which would take about 20 times longer via r2dtable(), by my calculation).

\n

however ...

\n

以这种方式生成的所有 10^5 矩阵,以及所有r2dtable满足约束的矩阵(只有 36 个)都是相同的!(我们通过将整个矩阵折叠成一个二进制字符串来进行比较......)有可能只有一个矩阵满足这些约束,或者可能在一个很大的空间中有一个非常小的数字,所以它是很难从一个到另一个......

\n
table(sapply(results, paste, collapse = ""))\n## 1111100010111111001010111 \n##                    100001 \ntable(sapply(tt[w], paste, collapse = ""))\n## 1111100010111111001010111 \n##                       36\n
Run Code Online (Sandbox Code Playgroud)\n

最后的想法

\n
    \n
  • 如果您已经有一个满足约束的(可能是非随机的)矩阵,或者您有一个合理的算法来生成单个实例,那么您就可以开始
  • \n
  • 试验交换算法应该能够很好地扩展(到更大的尺寸);Miklos 和 Podani (2004) 提出的两个例子分别是 56x28 和 118x80。这r2dtable() hack, however, will scale terribly. If your real problem is much bigger than 5x5 it will be impractical. (Simulated annealing or a genetic algorithm might work to find an initial matrix \xe2\x80\x94 start with a random matrix satisfying the row constraints and shuffle with an objective function equal to the squared deviation from the column constraints \xe2\x80\x94 but I didn\'t bother to spend the time figuring it out.)
  • \n
\n
\n

Miklos I. & Podani J. 2004。存在-不存在矩阵的随机化:评论和新算法。生态学 85:86-92。

\n


Tho*_*ing 11

更新:igraph解决方案

这是一个选项igraph,它创建一个随机二分图,其中rsumcsum作为各方的度分布,我相信它比版本更有效和优雅CVXR,并且它也提供了随机性。

library(igraph)

rsum <- c(4, 2, 3, 5, 3)
csum <- c(5, 1, 5, 2, 4)

mat <- sample_degseq(
  c(rsum, 0 * csum),
  c(0 * rsum, csum),
  method = "simple.no.multiple"
) %>%
  set_vertex_attr(name = "type", value = V(.) > length(rsum)) %>%
  get.incidence() %>%
  unname()
Run Code Online (Sandbox Code Playgroud)

我们得到

> mat
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    0    1    1    1
[2,]    1    0    1    0    0
[3,]    1    0    1    0    1
[4,]    1    1    1    1    1
[5,]    1    0    1    0    1
Run Code Online (Sandbox Code Playgroud)

上一个答案:CVXR解决方案

我想您可以将问题转化为具有行/列总和约束的优化问题,其中包CVXR可能是一种选择。

然而,这种方法的一个缺点是生成的矩阵不是随机的,这意味着每当运行下面的代码时,您总是会获得相同的结果。

代码示例

library(CVXR)

# Define the size of the matrix and row/column sums
n <- 5 # size of the matrix
rsum <- c(4, 2, 3, 5, 3) # row sums
csum <- c(5, 1, 5, 2, 4) # column sums

# Create binary variables for each cell in the matrix
X <- Variable(n, n, boolean = TRUE)

# Define the objective function (minimize 0)
obj <- Minimize(0)
# Define the row sum constraints
constrns <- list()

# Define the row and column sum constraints
for (i in 1:n) {
  constrns[[length(constrns)+1]] <- sum(X[i, ]) == rsum[i]
}
for (j in 1:n) {
  constrns[[length(constrns) + 1]] <- sum(X[, j]) == csum[j]
}

# Create the problem instance
problem <- Problem(obj, constrns)

# Solve the problem
result <- solve(problem)
mat <- round(result$getValue(X))
Run Code Online (Sandbox Code Playgroud)

你会得到一个mat像这样的矩阵

> mat
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    0    1    1    1
[2,]    1    0    1    0    0
[3,]    1    0    1    0    1
[4,]    1    1    1    1    1
[5,]    1    0    1    0    1
Run Code Online (Sandbox Code Playgroud)