从二进制矩阵中仅删除一个元素的行/列

Question

从二进制矩阵中仅删除一个元素的行/列

我试图从二进制矩阵中删除"单身人士".这里,单例指的是行中唯一的"1"值和它们出现的列.例如,给定以下矩阵:

> matrix(c(0,1,0,1,0,0,1,0,0,1,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,0,0,0,1,1), nrow=6)
     [,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,]    0    1    0    0    0    0    0
[2,]    1    0    1    0    0    0    0
[3,]    0    0    0    1    0    0    0
[4,]    1    1    0    0    0    0    0
[5,]    0    0    0    0    1    1    1
[6,]    0    0    0    0    1    0    1

Run Code Online (Sandbox Code Playgroud)

...我想删除第3行(如果可能的话,第4列的全部),因为[3,4]中的1是该行/列组合中的唯一1.[1,2]很好,因为列[,2]中还有其他1个; 类似地,[2,3]很好,因为行[2,]中还有其他1个.任何帮助将不胜感激 - 谢谢!

Answer 1

cr1*_*ade 3

您首先要查找哪些行和列是单例，然后检查是否存在共享索引的单例行和列对。这是完成此任务的一小段代码：

foo <- matrix(c(0,1,0,...))
singRows <- which(rowSums(foo) == 1)
singCols <- which(colSums(foo) == 1)
singCombinations <- expand.grid(singRows, singCols)
singPairs <- singCombinations[apply(singCombinations, 1,
    function(x) which(foo[x[1],] == 1) == x[2]),]
noSingFoo <- foo[-unique(singPairs[,1]), -unique(singPairs[,2])]

Run Code Online (Sandbox Code Playgroud)

对于许多单行或列，您可能需要提高效率，但它可以完成工作。

更新：这是我知道可以完成的更有效的版本。这样，您仅循环遍历行（或列，如果需要）而不是所有组合。因此，对于具有许多单行/列的矩阵来说，它的效率要高得多。

## starting with foo and singRows as before
singPairRows <- singRows[sapply(singRows, function(singRow)
    sum(foo[,foo[singRow,] == 1]) == 1)]
singPairs <- sapply(singPairRows, function(singRow)
    c(singRow, which(foo[singRow,] == 1)))
noSingFoo <- foo[-singPairs[1,], -singPairs[2,]]

Run Code Online (Sandbox Code Playgroud)

更新 2：我使用 rbenchmark 包比较了两种方法（mine=nonsparse 和 @Chris's=sparse）。我使用了一系列矩阵大小（从 10 到 1000 行/列；仅方阵）和稀疏级别（从每行/列 0.1 到 5 个非零条目）。下面的热图显示了相对性能水平。白色表示同等性能（运行时间的 log2 比率），红色表示稀疏方法更快，蓝色表示非稀疏方法更快。请注意，我没有在性能计算中包括到稀疏矩阵的转换，因此这会增加稀疏方法的一些时间。只是觉得值得付出一点努力来看看这个边界在哪里。相对性能

归档时间：	10 年，5 月前
查看次数：	572 次
最近记录：	10 年，5 月前