计算data.table中的行,按多列分组,包括"空"组

And*_*w S 0 r data.table

我有一个data.table,如下所示:

    ID      Date        Team    MonthFactor
1   2512    2015-04-24  Purple  2015-04
2   2512    2015-04-25  Purple  2015-04
3   2512    2015-04-26  Purple  2015-04
4   2512    2015-04-27  Purple  2015-04
Run Code Online (Sandbox Code Playgroud)

我想获得双方分组的行数TeamMonthFactor,包括当有没有从给定月份行,也就是说,如果紫队曾在五月中没有任何条目,但黄没有,汇总表将如下所示:

    Team    MonthFactor N
1   Purple  2015-04     10
2   Purple  2015-05     0
3   Yellow  2015-04     5
4   Yellow  2015-05     7
Run Code Online (Sandbox Code Playgroud)

如果我不需要"空"组,那么这样做是微不足道的,但是当我可能没有包含给定monthFactor的行时,我无法理解如何指定需要评估的组.

Jaa*_*aap 5

您可以通过使用交叉连接来实现:

dat[, .N, .(Team, MonthFactor)
    ][CJ(Team, MonthFactor, unique = TRUE), on = c(Team = "V1", MonthFactor = "V2")
      ][is.na(N), N := 0][]
Run Code Online (Sandbox Code Playgroud)

这给了:

     Team MonthFactor N
1: Purple     2015-04 2
2: Purple     2015-05 0
3: Yellow     2015-04 5
4: Yellow     2015-05 3
Run Code Online (Sandbox Code Playgroud)

这种方法的优点是更容易包含其他变量.假设这ID只是一个数值,请考虑以下示例:

dat[, .(.N, sID = sum(ID)), .(Team, MonthFactor)
    ][CJ(Team, MonthFactor, unique = TRUE), on = c(Team = "V1", MonthFactor = "V2")
      ][is.na(N), `:=` (N = 0, sID = 0)][]
Run Code Online (Sandbox Code Playgroud)

这使:

     Team MonthFactor N   sID
1: Purple     2015-04 2  5024
2: Purple     2015-05 0     0
3: Yellow     2015-04 5 12560
4: Yellow     2015-05 3  7536
Run Code Online (Sandbox Code Playgroud)

使用数据:

dat <- structure(list(ID = c(2512L, 2512L, 2512L, 2512L, 2512L, 2512L, 2512L, 2512L, 2512L, 2512L), 
                      Date = structure(c(1L, 2L, 1L, 2L, 3L, 4L, 4L, 2L, 3L, 4L), .Label = c("2015-04-24", "2015-04-25", "2015-04-26", "2015-04-27"), class = "factor"), 
                      Team = structure(c(1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Purple", "Yellow"), class = "factor"), 
                      MonthFactor = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("2015-04", "2015-05"), class = "factor")),
                 .Names = c("ID", "Date", "Team", "MonthFactor"), class = c("data.table", "data.frame"), row.names = c(NA, -10L))
Run Code Online (Sandbox Code Playgroud)