如何在 2 列表中找到所有可能的集合对之间的交集?

Emm*_*man 5 r set-intersection dplyr

我想计算集合之间的重叠系数。我的数据是一个 2 列表,例如:

\n
df_example <- \n  tibble::tribble(~my_group, ~cities,\n                   "foo",   "london",\n                   "foo",   "paris", \n                   "foo",   "rome", \n                   "foo",   "tokyo",\n                   "foo",   "oslo",\n                   "bar",   "paris", \n                   "bar",   "nyc",\n                   "bar",   "rome", \n                   "bar",   "munich",\n                   "bar",   "warsaw",\n                   "bar",   "sf", \n                   "baz",   "milano",\n                   "baz",   "oslo",\n                   "baz",   "sf",  \n                   "baz",   "paris")\n
Run Code Online (Sandbox Code Playgroud)\n

在 中df_example,我有 3 个集合(即 、foobarbaz,每个集合的成员在cities

\n

我希望最终得到一个与所有可能的集合对相交的表,并指定每对中较小集合的大小。这将导致计算重叠系数每对集合的

\n

(重叠系数=共同成员数/较小集合的大小)

\n

所需输出

\n
## # A tibble: 3 \xc3\x97 4\n##   combination n_instersected_members size_of_smaller_set  overlap_coeff\n##   <chr>                        <dbl>               <dbl>          <dbl>\n## 1 foo*bar                          2                   5           0.4 \n## 2 foo*baz                          3                   4           0.75\n## 3 bar*baz                          2                   4           0.5 \n
Run Code Online (Sandbox Code Playgroud)\n

有没有足够简单的方法来使用dplyr完成此任务动词来完成此任务?我试过了

\n
df_example |> \n  group_by(my_group) |> \n  summarise(intersected = dplyr::intersect(cities))\n
Run Code Online (Sandbox Code Playgroud)\n

但这显然行不通,因为dplyr::intersect()需要两个向量。有没有办法获得类似于我的dplyr的所需输出方向的所需输出?

\n

Tho*_*ing 4

这是一个基本 R 选项,使用combn

do.call(
    rbind,
    combn(
        with(
            df_example,
            split(cities, my_group)
        ),
        2,
        \(x)
        transform(
            data.frame(
                combo = paste0(names(x), collapse = "-"),
                nrIntersect = sum(x[[1]] %in% x[[2]]),
                szSmallSet = min(lengths(x))
            ),
            olCoeff = nrIntersect / szSmallSet
        ),
        simplify = FALSE
    )
)
Run Code Online (Sandbox Code Playgroud)

这使

    combo nrIntersect szSmallSet olCoeff
1 bar-baz           2          4     0.5
2 bar-foo           2          5     0.4
3 baz-foo           2          4     0.5
Run Code Online (Sandbox Code Playgroud)