如何在两个data.tables(或data.frames)的行之间创建随机匹配

Ben*_*Ben 10 r data.table

对于这个例子,我将使用该data.table包.

假设你有一张教练桌

coaches <- data.table(CoachID=c(1,2,3), CoachName=c("Bob","Sue","John"), NumPlayers=c(2,3,0))
coaches
   CoachID CoachName NumPlayers
1:       1       Bob          2
2:       2       Sue          3
3:       3      John          0
Run Code Online (Sandbox Code Playgroud)

还有一张球员表

players <- data.table(PlayerID=c(1,2,3,4,5,6), PlayerName=c("Abe","Bart","Chad","Dalton","Egor","Frank"))
players
   PlayerID PlayerName
1:        1        Abe
2:        2       Bart
3:        3       Chad
4:        4     Dalton
5:        5       Egor
6:        6      Frank
Run Code Online (Sandbox Code Playgroud)

你想让每个教练与一组球员相匹配

  • 与每个教练相关联的球员数量由NumPlayers字段定义
  • 没有两个教练被绑在同一个球员身上
  • 球员和教练是随机匹配的

你好吗?

exampleResult <- data.table(CoachID=c(1,1,2,2,2,3), PlayerID=c(3,1,2,5,6,NA))
exampleResult

   CoachID PlayerID
1:       1        3
2:       1        1
3:       2        2
4:       2        5
5:       2        6
6:       3       NA
Run Code Online (Sandbox Code Playgroud)

jos*_*ber 6

你可以在没有替换玩家ID的情况下进行采样,获取你需要的玩家总数:

set.seed(144)
(selections <- sample(players$PlayerID, sum(coaches$NumPlayers)))
# [1] 1 4 3 2 6
Run Code Online (Sandbox Code Playgroud)

每个玩家都有相同的被包含概率selections,并且该向量的排序是随机的.因此,您可以将这些球员分配到每个教练位置:

data.frame(CoachID=rep(coaches$CoachID, coaches$NumPlayers),
           PlayerID=selections)
#   CoachID PlayerID
# 1       1        1
# 2       1        4
# 3       2        3
# 4       2        2
# 5       2        6
Run Code Online (Sandbox Code Playgroud)

如果您希望为NA没有选择球员的任何教练提供值,您可以执行以下操作:

rbind(data.frame(CoachID=rep(coaches$CoachID, coaches$NumPlayers),
                 PlayerID=selections),
      data.frame(CoachID=coaches$CoachID[coaches$NumPlayers==0],
                 PlayerID=rep(NA, sum(coaches$NumPlayers==0))))
#   CoachID PlayerID
# 1       1        1
# 2       1        4
# 3       2        3
# 4       2        2
# 5       2        6
# 6       3       NA
Run Code Online (Sandbox Code Playgroud)


Fra*_*ank 5

获取每一方的需求和供应,可以这么说:

demand <- with(coaches,rep(CoachID,NumPlayers))
supply <- players$PlayerID
Run Code Online (Sandbox Code Playgroud)

然后我会......

randmatch <- function(demand,supply){
  n_demand  <- length(demand)
  n_supply  <- length(supply)
  n_matches <- min(n_demand,n_supply)

  if (n_demand >= n_supply) 
    data.frame(d=sample(demand,n_matches),s=supply)
  else 
    data.frame(d=demand,s=sample(supply,n_matches))
}
Run Code Online (Sandbox Code Playgroud)

例子:

set.seed(1)
randmatch(demand,supply)    # some players unmatched, OP's example
randmatch(rep(1:3,1:3),1:4) # some coaches unmatched 
Run Code Online (Sandbox Code Playgroud)

不过,我不确定这是OP想要覆盖的情况.


对于OP的期望输出......

m <- randmatch(demand,supply)
merge(m,coaches,by.x="d",by.y="CoachID",all=TRUE)
#   d  s CoachName NumPlayers
# 1 1  2       Bob          2
# 2 1  6       Bob          2
# 3 2  3       Sue          3
# 4 2  4       Sue          3
# 5 2  1       Sue          3
# 6 3 NA      John          0
Run Code Online (Sandbox Code Playgroud)

同样...

merge(m,players,by.x="s",by.y="PlayerID",all=TRUE)
#   s  d PlayerName
# 1 1  2        Abe
# 2 2  1       Bart
# 3 3  2       Chad
# 4 4  2     Dalton
# 5 5 NA       Egor
# 6 6  1      Frank
Run Code Online (Sandbox Code Playgroud)