乘以两个data.tables,保留所有可能性

era*_*rtg 6 r data.table

我现在找不到副本.

我的问题如下:

我有两个data.tables.一个有两列(featurea,count),另一个有三列(featureb,featurec,count).我想要乘以(?),以便我有一个新data.table的所有可能性.诀窍是这些功能不匹配,因此merge解决方案可能无法解决问题.

MRE如下:

# two columns
DT1 <- data.table(featurea =c("type1","type2"), count = c(2,3))

#       featurea count
#1:    type1     2
#2:    type2     3

#three columns
DT2 <- data.table(origin =c("house","park","park"), color =c("red","blue","red"),count =c(2,1,2))

#   origin color count
#1:  house   red     2
#2:   park  blue     1
#3:   park   red     2
Run Code Online (Sandbox Code Playgroud)

在这种情况下,我的预期结果data.table如下:

> DT3
   origin color featurea total
1:  house   red    type1     4
2:  house   red    type2     6
3:   park  blue    type1     2
4:   park  blue    type2     3
5:   park   red    type1     4
6:   park   red    type2     6
Run Code Online (Sandbox Code Playgroud)

Rol*_*and 8

请测试更大的数据,我不确定这是多么优化:

DT2[, .(featurea = DT1[["featurea"]], 
        count = count * DT1[["count"]]), by = .(origin, color)]
#   origin color featurea count
#1:  house   red    type1     4
#2:  house   red    type2     6
#3:   park  blue    type1     2
#4:   park  blue    type2     3
#5:   park   red    type1     4
#6:   park   red    type2     6
Run Code Online (Sandbox Code Playgroud)

如果DT1组的数量较少,可能会更有效地切换它:

DT1[, c(DT2[, .(origin, color)], 
        .(count = count * DT2[["count"]])), by = featurea]
#   featurea origin color count
#1:    type1  house   red     4
#2:    type1   park  blue     2
#3:    type1   park   red     4
#4:    type2  house   red     6
#5:    type2   park  blue     3
#6:    type2   park   red     6
Run Code Online (Sandbox Code Playgroud)


jaz*_*rro 6

这将是一种方式.首先,我在扩展行DT2expandRows()splitstackshape包中.由于我指定,每行重复两次count = 2, count.is.col = FALSE.然后,我处理了乘法并创建了一个名为的新列total.与此同时,我为其创建了一个新专栏featurea.最后,我放弃了count.

library(data.table)
library(splitstackshape)

expandRows(DT2, count = nrow(DT1), count.is.col = FALSE)[,
    `:=` (total = count * DT1[, count], featurea = DT1[, featurea])][, count := NULL]
Run Code Online (Sandbox Code Playgroud)

编辑

如果您不想添加其他包,可以在评论中尝试David的想法.

DT2[rep(1:.N, nrow(DT1))][,
   `:=`(total = count * DT1$count, featurea = DT1$featurea, count = NULL)][]



#   origin color total featurea
#1:  house   red     4    type1
#2:  house   red     6    type2
#3:   park  blue     2    type1
#4:   park  blue     3    type2
#5:   park   red     4    type1
#6:   park   red     6    type2
Run Code Online (Sandbox Code Playgroud)