在 R 中设置覆盖近似值

Question

在 R 中设置覆盖近似值

mpa*_*nco 5 r linear-programming combinatorics set-cover

Run Code Online (Sandbox Code Playgroud)

列中元素的唯一数量n是：

unique(d$n)
[1] 1 2 3 4 5

Run Code Online (Sandbox Code Playgroud)

我想计算sets覆盖 n （宇宙）中所有独特元素的较小数量的集合（列）。在此示例中，有两个集合：s1 {1, 2, 3} 和 s4 {4, 5}。我在维基百科和互联网上读过相关内容，并且我知道可以应用贪婪算法来找到近似值。我也检查了这个链接，其中他们提到了两个包来解决此类问题，LPsolve并且Rsymphony，但我什至不知道如何开始。在我现实生活中的例子中，我有超过 40,000 个集合，每个集合有 1,000 到 10,000 个元素，而无生命或独特的元素有 80,000 个。

任何有关如何开始或继续的帮助或指导将非常感激。

数据

d <- structure(list(sets = structure(c(1L, 1L, 1L, 2L, 2L, 3L, 3L, 
4L, 4L), .Label = c("s1", "s2", "s3", "s4"), class = "factor"), 
    n = c(1, 2, 3, 2, 4, 3, 4, 4, 5)), .Names = c("sets", "n"
), row.names = c(NA, -9L), class = "data.frame")

Run Code Online (Sandbox Code Playgroud)

Answer 1

42-*_*42- 4

该lpSolve包可在 CRAN 上用于解决线性规划问题。使用您的链接，其中有来自非常有信誉的 Hans Borchers 的回复，以及http://math.mit.edu/~goemans/18434S06/setcover-tamara 中稍微复杂的示例（从第 4/5 页开始）。 pdf作为模板来理解设置的正确结构，然后按照对中第一个示例的修改进行操作?lp：

library( lpSolve)
?lp
# In Details: "Note that every variable is assumed to be >= 0!"
# go from your long-form rep of the sets to a wide form for a matrix representation
( items.mat<- t(table(d$sets,d$n))  )  # could have reversed order of args to skip t()
#---------
> dimnames(items.mat) = list( items=1:5, sets=paste0("s", 1:4) )
> items.mat
     sets
items s1 s2 s3 s4
    1  1  0  0  0
    2  1  1  0  0
    3  1  0  1  0
    4  0  1  1  1
    5  0  0  0  1
#---------
f.obj <-  rep(1,4)  # starting values of objective parameters by column (to be solved)
f.dir <- rep(">=",5) # the constraint "directions" by row
f.rhs <- rep(1,5)    # the inequality values by row (require all items to be present)

lp ("min", f.obj, items.mat, f.dir, f.rhs)$solution
#[1] 1 0 0 1

Run Code Online (Sandbox Code Playgroud)

所以设置s1和s4是一个最小的封面。“列系数”决定“集合”的选择。

归档时间：	9 年，8 月前
查看次数：	1816 次
最近记录：	9 年，8 月前