如何在 2 列表中找到所有可能的集合对之间的交集？

Question

如何在 2 列表中找到所有可能的集合对之间的交集？

我想计算集合之间的重叠系数。我的数据是一个 2 列表，例如：

\n

df_example <- \n  tibble::tribble(~my_group, ~cities,\n                   "foo",   "london",\n                   "foo",   "paris", \n                   "foo",   "rome", \n                   "foo",   "tokyo",\n                   "foo",   "oslo",\n                   "bar",   "paris", \n                   "bar",   "nyc",\n                   "bar",   "rome", \n                   "bar",   "munich",\n                   "bar",   "warsaw",\n                   "bar",   "sf", \n                   "baz",   "milano",\n                   "baz",   "oslo",\n                   "baz",   "sf",  \n                   "baz",   "paris")\n

Run Code Online (Sandbox Code Playgroud)\n

在中df_example，我有 3 个集合（即、foo、bar）baz，每个集合的成员在cities。

\n

我希望最终得到一个与所有可能的集合对相交的表，并指定每对中较小集合的大小。这将导致计算重叠系数每对集合的

\n

（重叠系数=共同成员数/较小集合的大小）

\n

所需输出

\n

## # A tibble: 3 \xc3\x97 4\n##   combination n_instersected_members size_of_smaller_set  overlap_coeff\n##   <chr>                        <dbl>               <dbl>          <dbl>\n## 1 foo*bar                          2                   5           0.4 \n## 2 foo*baz                          3                   4           0.75\n## 3 bar*baz                          2                   4           0.5 \n

Run Code Online (Sandbox Code Playgroud)\n

有没有足够简单的方法来使用dplyr完成此任务动词来完成此任务？我试过了

\n
df_example |> \n group_by(my_group) |> \n summarise(intersected = dplyr::intersect(cities))\n
Run Code Online (Sandbox Code Playgroud)\n
但这显然行不通，因为dplyr::intersect()需要两个向量。有没有办法获得类似于我的dplyr的所需输出方向的所需输出？
\n

Answer 1

Tho*_*ing 4

这是一个基本 R 选项，使用combn

do.call(
    rbind,
    combn(
        with(
            df_example,
            split(cities, my_group)
        ),
        2,
        \(x)
        transform(
            data.frame(
                combo = paste0(names(x), collapse = "-"),
                nrIntersect = sum(x[[1]] %in% x[[2]]),
                szSmallSet = min(lengths(x))
            ),
            olCoeff = nrIntersect / szSmallSet
        ),
        simplify = FALSE
    )
)

Run Code Online (Sandbox Code Playgroud)

这使

    combo nrIntersect szSmallSet olCoeff
1 bar-baz           2          4     0.5
2 bar-foo           2          5     0.4
3 baz-foo           2          4     0.5

Run Code Online (Sandbox Code Playgroud)

归档时间：	2 年前
查看次数：	125 次
最近记录：	2 年前