在R中重组团队到个人级别的数据(同时保留团队级别的信息)

wax*_*tax 6 r

我目前的数据如下:

Person  Team
  10    100
  11    100
  12    100
  10    200
  11    200
  14    200
  15    200
Run Code Online (Sandbox Code Playgroud)

我想根据他们在一起的队伍来推断出彼此认识的人.我还想要计算一个团队在一个团队中的次数,我想跟踪链接每对人的团队识别码.换句话说,我想创建一个如下所示的数据集:

Person1 Person2 Count   Team1   Team2   Team3
   10      11     2      100     200     NA
   10      12     1      100     NA      NA
   11      12     1      100     NA      NA
   10      14     1      200     NA      NA
   10      15     1      200     NA      NA
   11      14     1      200     NA      NA
   11      15     1      200     NA      NA
Run Code Online (Sandbox Code Playgroud)

生成的数据集捕获可以根据原始数据集中概述的团队推断出的关系."Count"变量反映了一对人在一起的实例数量."Team1","Team2"和"Team3"变量列出了将每对人员彼此链接的团队ID.首先列出哪个人/团队ID与第二名相比没有区别.团队规模从2名成员到8名成员.

A5C*_*2T1 6

这是一个"data.table"解决方案,似乎可以到达你想要的地方(虽然有很多代码):

library(data.table)
dcast.data.table(
  dcast.data.table(
    as.data.table(d)[, combn(Person, 2), by = Team][
      , ind := paste0("Person", c(1, 2))][
        , time := sequence(.N), by = list(Team, ind)], 
    time + Team ~ ind, value.var = "V1")[
      , c("count", "time") := list(.N, sequence(.N)), by = list(Person1, Person2)],
  Person1 + Person2 + count ~ time, value.var = "Team")
#    Person1 Person2 count   1   2
# 1:      10      11     2 100 200
# 2:      10      12     1 100  NA
# 3:      10      14     1 200  NA
# 4:      10      15     1 200  NA
# 5:      11      12     1 100  NA
# 6:      11      14     1 200  NA
# 7:      11      15     1 200  NA
# 8:      14      15     1 200  NA
Run Code Online (Sandbox Code Playgroud)

更新:上述的逐步版本

要了解上面发生的事情,这是一个循序渐进的方法:

## The following would be a long data.table with 4 columns:
##   Team, V1, ind, and time
step1 <- as.data.table(d)[
  , combn(Person, 2), by = Team][
    , ind := paste0("Person", c(1, 2))][
      , time := sequence(.N), by = list(Team, ind)]
head(step1)
#    Team V1     ind time
# 1:  100 10 Person1    1
# 2:  100 11 Person2    1
# 3:  100 10 Person1    2
# 4:  100 12 Person2    2
# 5:  100 11 Person1    3
# 6:  100 12 Person2    3

## Here, we make the data "wide"
step2 <- dcast.data.table(step1, time + Team ~ ind, value.var = "V1")
step2
#    time Team Person1 Person2
# 1:    1  100      10      11
# 2:    1  200      10      11
# 3:    2  100      10      12
# 4:    2  200      10      14
# 5:    3  100      11      12
# 6:    3  200      10      15
# 7:    4  200      11      14
# 8:    5  200      11      15
# 9:    6  200      14      15

## Create a "count" column and a "time" column,
##   grouped by "Person1" and "Person2".
##   Count is for the count column.
##   Time is for going to a wide format
step3 <- step2[, c("count", "time") := list(.N, sequence(.N)), 
               by = list(Person1, Person2)]
step3
#    time Team Person1 Person2 count
# 1:    1  100      10      11     2
# 2:    2  200      10      11     2
# 3:    1  100      10      12     1
# 4:    1  200      10      14     1
# 5:    1  100      11      12     1
# 6:    1  200      10      15     1
# 7:    1  200      11      14     1
# 8:    1  200      11      15     1
# 9:    1  200      14      15     1

## The final step of going wide
out <- dcast.data.table(step3, Person1 + Person2 + count ~ time, 
                        value.var = "Team")
out
#    Person1 Person2 count   1   2
# 1:      10      11     2 100 200
# 2:      10      12     1 100  NA
# 3:      10      14     1 200  NA
# 4:      10      15     1 200  NA
# 5:      11      12     1 100  NA
# 6:      11      14     1 200  NA
# 7:      11      15     1 200  NA
# 8:      14      15     1 200  NA
Run Code Online (Sandbox Code Playgroud)