在 R 中查找所有可能的团队

luc*_*ano 14 r combinatorics combn

感觉这应该很简单,但我已经经历过堆栈溢出和combn帮助,但看不到解决方案。

下面的玩家需要组成 3 对 2 的队伍。我需要找到所有可能的队伍组合。例如,两个可能的团队是“Ross”、“Bobby”和“Casper”在一个团队中,“Max”和“Jake”在另一团队中。我该如何编码?

players <- c("Ross", "Bobby", "Max", "Casper", "Jake")
Run Code Online (Sandbox Code Playgroud)

Tho*_*ing 18

我认为关键的一步是从5名球员中随机选择3(或2)名产生第一队,剩下的自然分配到第二队。

也许你可以这样尝试

combn(players,
  3, 
  function(x) list(team1 = x, team2 = players[!players %in% x]),
  simplify = FALSE
)
Run Code Online (Sandbox Code Playgroud)

或者如果你不介意时间效率,你可以setdiff使用

combn(players,
    3,
    function(x) list(team1 = x, team2 = setdiff(players, x)),
    simplify = FALSE
  )
Run Code Online (Sandbox Code Playgroud)

并且其中任何一个都给出

[[1]]
[[1]]$team1
[1] "Ross"  "Bobby" "Max"

[[1]]$team2
[1] "Casper" "Jake"


[[2]]
[[2]]$team1
[1] "Ross"   "Bobby"  "Casper"

[[2]]$team2
[1] "Max"  "Jake"


[[3]]
[[3]]$team1
[1] "Ross"  "Bobby" "Jake"

[[3]]$team2
[1] "Max"    "Casper"


[[4]]
[[4]]$team1
[1] "Ross"   "Max"    "Casper"

[[4]]$team2
[1] "Bobby" "Jake"


[[5]]
[[5]]$team1
[1] "Ross" "Max"  "Jake"

[[5]]$team2
[1] "Bobby"  "Casper"


[[6]]
[[6]]$team1
[1] "Ross"   "Casper" "Jake"

[[6]]$team2
[1] "Bobby" "Max"


[[7]]
[[7]]$team1
[1] "Bobby"  "Max"    "Casper"

[[7]]$team2
[1] "Ross" "Jake"


[[8]]
[[8]]$team1
[1] "Bobby" "Max"   "Jake"

[[8]]$team2
[1] "Ross"   "Casper"


[[9]]
[[9]]$team1
[1] "Bobby"  "Casper" "Jake"

[[9]]$team2
[1] "Ross" "Max"


[[10]]
[[10]]$team1
[1] "Max"    "Casper" "Jake"

[[10]]$team2
[1] "Ross"  "Bobby"
Run Code Online (Sandbox Code Playgroud)

标杆管理

players <- c("Ross", "Bobby", "Max", "Casper", "Jake")

microbenchmark(
  f1 = combn(players,
    3,
    function(x) list(team1 = x, team2 = players[!players %in% x]),
    simplify = FALSE
  ),
  f2 = combn(players,
    3,
    function(x) list(team1 = x, team2 = setdiff(players, x)),
    simplify = FALSE
  )
)
Run Code Online (Sandbox Code Playgroud)

我们将会看到

Unit: microseconds
 expr    min      lq      mean   median       uq      max neval
   f1 26.200 28.0510  51.55586  29.4515  32.5015 1935.301   100
   f2 97.301 99.8505 119.44610 103.6010 111.1510 1162.501   100
Run Code Online (Sandbox Code Playgroud)

  • @KonradRudolph 我添加了一个基准 (2认同)
  • 好吧,我没想到会这样。感谢您添加基准。我必须思考为什么会发生这种情况。 (2认同)
  • @JosephWood 是的,我也看到了。这绝对不是一个最佳的实现(“match”和“duplicated”最终会做大量冗余的工作,而更好的实现只会完成一次这项工作)。 (2认同)

Jos*_*ood 7

RcppAlgos(我是作者)中有一个函数comboGroups专门为此任务而构建。从版本开始2.8.0,我们现在可以处理不同大小的组。

\n
library(RcppAlgos)\npackageVersion("RcppAlgos")\n#> [1] \'2.8.0\'\n
Run Code Online (Sandbox Code Playgroud)\n

我们使用以下参数指定每个团队的规模grpSizes

\n
players <- c("Ross", "Bobby", "Max", "Casper", "Jake")\ncomboGroups(players, grpSizes = c(2, 3))\n#>       Grp1     Grp1     Grp2    Grp2     Grp2    \n#>  [1,] "Ross"   "Bobby"  "Max"   "Casper" "Jake"  \n#>  [2,] "Ross"   "Max"    "Bobby" "Casper" "Jake"  \n#>  [3,] "Ross"   "Casper" "Bobby" "Max"    "Jake"  \n#>  [4,] "Ross"   "Jake"   "Bobby" "Max"    "Casper"\n#>  [5,] "Bobby"  "Max"    "Ross"  "Casper" "Jake"  \n#>  [6,] "Bobby"  "Casper" "Ross"  "Max"    "Jake"  \n#>  [7,] "Bobby"  "Jake"   "Ross"  "Max"    "Casper"\n#>  [8,] "Max"    "Casper" "Ross"  "Bobby"  "Jake"  \n#>  [9,] "Max"    "Jake"   "Ross"  "Bobby"  "Casper"\n#> [10,] "Casper" "Jake"   "Ross"  "Bobby"  "Max"\n
Run Code Online (Sandbox Code Playgroud)\n

它非常高效且非常灵活。让\xe2\x80\x99s 对更多的玩家进行测试。

\n
library(microbenchmark)\nmore_players <- c(players, "Kai", "Eliana", "Jayden", "Luca",\n                  "Rowan", "Nova", "Amara", "Finn", "Zion", "Mia")\n\nmicrobenchmark(\n    f1 = combn(more_players,\n               7,\n               function(x) list(team1 = x, team2 = more_players[!more_players %in% x]),\n               simplify = FALSE\n    ),\n    f2 = combn(more_players,\n               7,\n               function(x) list(team1 = x, team2 = setdiff(more_players, x)),\n               simplify = FALSE\n    ),\n    f3_rcpp = comboGeneral(\n        v = more_players,\n        m = 7,\n        repetition = FALSE,\n        FUN = function(x) list(team1 = x, team2 = more_players[!more_players %in% x])\n    ),\n    f4 = comboGroups(more_players, grpSizes = c(7, 8)),\n    unit = "relative"\n)\n#> Unit: relative\n#>     expr      min       lq     mean   median       uq       max neval  cld\n#>       f1 16.22184 19.03265 23.68141 20.78979 26.59770  92.08256   100 a   \n#>       f2 41.34947 45.55672 57.30847 54.70320 59.88822 123.09864   100  b  \n#>  f3_rcpp 11.83424 14.48575 17.65936 16.24856 21.43574  36.27697   100   c \n#>       f4  1.00000  1.00000  1.00000  1.00000  1.00000   1.00000   100    d\n
Run Code Online (Sandbox Code Playgroud)\n

超过2组

\n

更奇特的群体又如何呢?对于大多数其他方法,效率和代码维护将是一个问题。

\n

举例来说,给定more_players(总共 15 名玩家),如果我们想要找到包含 2 支规模为 3 的球队、一支规模为 4 的球队和一支规模为 5 的球队的所有可能分组,该怎么办?

\n

有了comboGroups,就没有问题了:

\n
system.time(t <- comboGroups(more_players, grpSizes = c(3, 3, 4, 5)))\n#>    user  system elapsed \n#>   0.508   0.040   0.548 \n\nhead(t, n = 2)\n#>      Grp1   Grp1    Grp1  Grp2     Grp2   Grp2  Grp3     Grp3     Grp3   Grp3   \n#> [1,] "Ross" "Bobby" "Max" "Casper" "Jake" "Kai" "Eliana" "Jayden" "Luca" "Rowan"\n#> [2,] "Ross" "Bobby" "Max" "Casper" "Jake" "Kai" "Eliana" "Jayden" "Luca" "Nova" \n#>      Grp4    Grp4    Grp4   Grp4   Grp4 \n#> [1,] "Nova"  "Amara" "Finn" "Zion" "Mia"\n#> [2,] "Rowan" "Amara" "Finn" "Zion" "Mia"\n\ntail(t, n = 2)\n#>            Grp1    Grp1   Grp1  Grp2   Grp2    Grp2   Grp3   Grp3     Grp3    \n#> [6306299,] "Rowan" "Zion" "Mia" "Nova" "Amara" "Finn" "Jake" "Eliana" "Jayden"\n#> [6306300,] "Rowan" "Zion" "Mia" "Nova" "Amara" "Finn" "Kai"  "Eliana" "Jayden"\n#>            Grp3   Grp4   Grp4    Grp4  Grp4     Grp4  \n#> [6306299,] "Luca" "Ross" "Bobby" "Max" "Casper" "Kai" \n#> [6306300,] "Luca" "Ross" "Bobby" "Max" "Casper" "Jake"\n
Run Code Online (Sandbox Code Playgroud)\n

如果您只需要可能团队的样本,请尝试comboGroupsSample

\n
comboGroupsSample(\n    more_players, grpSizes = c(3, 3, 4, 5), n = 2,\n    seed = 42, namedSample = TRUE\n)\n#>         Grp1    Grp1     Grp1   Grp2    Grp2    Grp2   Grp3   Grp3     Grp3    \n#> 3207141 "Bobby" "Jake"   "Mia"  "Luca"  "Amara" "Zion" "Max"  "Casper" "Eliana"\n#> 4248729 "Max"   "Casper" "Nova" "Rowan" "Amara" "Finn" "Ross" "Bobby"  "Kai"   \n#>         Grp3    Grp4   Grp4     Grp4     Grp4   Grp4  \n#> 3207141 "Rowan" "Ross" "Kai"    "Jayden" "Nova" "Finn"\n#> 4248729 "Luca"  "Jake" "Eliana" "Jayden" "Zion" "Mia"\n
Run Code Online (Sandbox Code Playgroud)\n


On_*_*and 6

只是为了构建@ThomasIsCoding 很好的答案。通过使用出色的 RcppAlgos 库和函数可以获得进一步的速度comboGeneral

players <- c("Ross", "Bobby", "Max", "Casper", "Jake")

microbenchmark(
  f1 = combn(players,
             3,
             function(x) list(team1 = x, team2 = players[!players %in% x]),
             simplify = FALSE
  ),
  f2 = combn(players,
             3,
             function(x) list(team1 = x, team2 = setdiff(players, x)),
             simplify = FALSE
  ),
  f3_rcpp = RcppAlgos::comboGeneral(
    v = players,
    m = 3,
    repetition = FALSE,
    FUN = function(x) list(team1 = x, team2 = players[!players %in% x])
  )
)
Run Code Online (Sandbox Code Playgroud)

基准

Unit: microseconds
    expr   min     lq    mean median     uq    max neval
      f1  34.3  37.95  63.693  39.00  41.45 2016.7   100
      f2 143.5 152.40 184.351 155.30 158.95 1961.7   100
 f3_rcpp  29.3  32.40  61.820  33.95  42.40 2205.0   100
Run Code Online (Sandbox Code Playgroud)