如何使用data.table的j在不同的子集上创建多个新列

Con*_*r J 2 r subset data.table

我想创建多个聚合数据集的各个子集的变量.有关说明示例,请说明您有以下数据:

DT = data.table(Group1 = c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4), 
                Group2 = c(1,1,1,2,2,1,1,2,2,2,1,1,1,1,2,1,1,2,2,2), 
                  Var1 = c(1,1,0,0,0,1,1,0,1,0,1,0,0,0,0,0,0,0,0,0))
Run Code Online (Sandbox Code Playgroud)

我想找几个变量的平均值Var1.我想知道:

  • mean(Var1) 分组 Group1
  • mean(Var1)仅适用于那些Group2 == 1分组的人Group1
  • mean(Var1)仅适用于那些Group2 == 2分组的人Group1

或者,在data.table用语中,

DT[, mean(Var1), by=Group1]
DT[Group2==1, mean(Var1), by=Group1]
DT[Group2==2, mean(Var1), by=Group1]
Run Code Online (Sandbox Code Playgroud)

显然,计算其中任何一个都非常简单.但我无法找到计算所有这三个的好方法,因为它们使用不同的子集i.到目前为止我一直在使用的解决方案是单独生成它们,然后将它们合并到一个统一的表中.

DT_all <- DT[, .(avgVar1_all = mean(Var1)), by = Group1]
DT_1 <- DT[Group2 == 1, .(avgVar1_1 = mean(Var1)), by = Group1]
DT_2 <- DT[Group2 == 2, .(avgVar1_2 = mean(Var1)), by = Group1]
group_info <- merge(DT_all, DT_1, by = "Group1")
group_info <- merge(group_info, DT_2, by = "Group1")

group_info
#    Group1 avgVar1_all avgVar1_1 avgVar1_2
# 1:      1         0.4 0.6666667 0.0000000
# 2:      2         0.6 1.0000000 0.3333333
# 3:      3         0.2 0.2500000 0.0000000
# 4:      4         0.0 0.0000000 0.0000000
Run Code Online (Sandbox Code Playgroud)

我可以使用更优雅的方法吗?

the*_*ail 5

只需在一个分组操作中执行以下操作.SD:

DT[, .(
        all  = mean(Var1),
        grp1 = .SD[Group2==1, mean(Var1)],
        grp2 = .SD[Group2==2, mean(Var1)]
      ),
  by = Group1,
  .SDcols=c("Group2","Var1")
  ]

#   Group1 all      grp1      grp2
#1:      1 0.4 0.6666667 0.0000000
#2:      2 0.6 1.0000000 0.3333333
#3:      3 0.2 0.2500000 0.0000000
#4:      4 0.0 0.0000000 0.0000000
Run Code Online (Sandbox Code Playgroud)