我有一个data.table,如下所示:
ID Date Team MonthFactor
1 2512 2015-04-24 Purple 2015-04
2 2512 2015-04-25 Purple 2015-04
3 2512 2015-04-26 Purple 2015-04
4 2512 2015-04-27 Purple 2015-04
Run Code Online (Sandbox Code Playgroud)
我想获得双方分组的行数Team和MonthFactor,包括当有没有从给定月份行,也就是说,如果紫队曾在五月中没有任何条目,但黄没有,汇总表将如下所示:
Team MonthFactor N
1 Purple 2015-04 10
2 Purple 2015-05 0
3 Yellow 2015-04 5
4 Yellow 2015-05 7
Run Code Online (Sandbox Code Playgroud)
如果我不需要"空"组,那么这样做是微不足道的,但是当我可能没有包含给定monthFactor的行时,我无法理解如何指定需要评估的组.
您可以通过使用交叉连接来实现:
dat[, .N, .(Team, MonthFactor)
][CJ(Team, MonthFactor, unique = TRUE), on = c(Team = "V1", MonthFactor = "V2")
][is.na(N), N := 0][]
Run Code Online (Sandbox Code Playgroud)
这给了:
Team MonthFactor N
1: Purple 2015-04 2
2: Purple 2015-05 0
3: Yellow 2015-04 5
4: Yellow 2015-05 3
Run Code Online (Sandbox Code Playgroud)
这种方法的优点是更容易包含其他变量.假设这ID只是一个数值,请考虑以下示例:
dat[, .(.N, sID = sum(ID)), .(Team, MonthFactor)
][CJ(Team, MonthFactor, unique = TRUE), on = c(Team = "V1", MonthFactor = "V2")
][is.na(N), `:=` (N = 0, sID = 0)][]
Run Code Online (Sandbox Code Playgroud)
这使:
Team MonthFactor N sID
1: Purple 2015-04 2 5024
2: Purple 2015-05 0 0
3: Yellow 2015-04 5 12560
4: Yellow 2015-05 3 7536
Run Code Online (Sandbox Code Playgroud)
使用数据:
dat <- structure(list(ID = c(2512L, 2512L, 2512L, 2512L, 2512L, 2512L, 2512L, 2512L, 2512L, 2512L),
Date = structure(c(1L, 2L, 1L, 2L, 3L, 4L, 4L, 2L, 3L, 4L), .Label = c("2015-04-24", "2015-04-25", "2015-04-26", "2015-04-27"), class = "factor"),
Team = structure(c(1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Purple", "Yellow"), class = "factor"),
MonthFactor = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("2015-04", "2015-05"), class = "factor")),
.Names = c("ID", "Date", "Team", "MonthFactor"), class = c("data.table", "data.frame"), row.names = c(NA, -10L))
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
125 次 |
| 最近记录: |