R table.table group by i table的属性

sch*_*h56 7 r data.table

我想在data.table连接中使用i表的列进行计算和分组.这种语法似乎有一些限制.你能建议一个更干净的方法吗?

require(data.table)
set.seed(1)
Run Code Online (Sandbox Code Playgroud)

表格1

DT1 <- data.table(loc = c("L1","L2"), product = c("P1","P2","P3"), qty = runif(12))
Run Code Online (Sandbox Code Playgroud)

表2

DT2 <- data.table(product = c("P1","P2","P3"), family = c("A","A","B"), price = c(5,7,10))
Run Code Online (Sandbox Code Playgroud)

表上的直接连接很好:[这里不是问题,但是在on子句中使用引用的列名称的要求似乎与data.table不一致]

DT1[DT2, on = "product"]
#    loc product       qty family price
# 1:  L1      P1 0.1297134      A     5
# 2:  L2      P1 0.2423550      A     5
# 3:  L1      P1 0.3421633      A     5
# 4:  L2      P1 0.6537663      A     5
# 5:  L2      P2 0.9822407      A     7
# 6:  L1      P2 0.8568853      A     7
# 7:  L2      P2 0.7062672      A     7
# 8:  L1      P2 0.9224086      A     7
# 9:  L1      P3 0.8267184      B    10
#10:  L2      P3 0.8408788      B    10
#11:  L1      P3 0.6212432      B    10
#12:  L2      P3 0.5363538      B    10
Run Code Online (Sandbox Code Playgroud)

使用两个表的列进行计算很好:

DT1[DT2, .(family, product, val = qty*price), on = "product"]
#    family product       val
# 1:      A      P1 0.6485671
# 2:      A      P1 1.2117750
# 3:      A      P1 1.7108164
# 4:      A      P1 3.2688313
# 5:      A      P2 6.8756851
# 6:      A      P2 5.9981971
# 7:      A      P2 4.9438704
# 8:      A      P2 6.4568599
# 9:      B      P3 8.2671841
#10:      B      P3 8.4087878
#11:      B      P3 6.2124323
#12:      B      P3 5.3635379
Run Code Online (Sandbox Code Playgroud)

我可以在.EACHI上进行分组和聚合

DT1[DT2,.(family, product, val = sum(qty*price)), on = "product", by = .EACHI]
#   product family product      val
#1:      P1      A      P1  6.83999
#2:      P2      A      P1 24.27461
#3:      P3      B      P1 28.25194
Run Code Online (Sandbox Code Playgroud)

但不使用产品

DT1[DT2,.(family, product, val = sum(qty*price)), on = "product", by = product]
#Error in `[.data.table`(DT1, DT2, .(family, product, val = sum(qty * price)),  : 
#object 'price' not found
Run Code Online (Sandbox Code Playgroud)

在这种情况下,它不再在i表上找到价格.

在这种情况下,.EACHI是可用的,因为by元素是DT2的唯一键.

但是,如果我想按DT2的属性进行分组,我似乎无法使用.EACHI引用.我通过以下方式实现了我想要的目标:

DT1[DT2, .(family, product, val = qty*price), on = "product"][, .(sum(val)), by = family]
#   family       V1
#1:      A 31.11460
#2:      B 28.25194
Run Code Online (Sandbox Code Playgroud)

这种双重处理是必要的还是我可以在这种情况下使用另一段语法?

Cés*_*ero -4

由于多种原因,我不会使用此过程来进行摘要。您可以使用dplyrorplyr轻松加入和总结。像这样:

require(data.table)
library(dplyr)
set.seed(1)

DT1 <- data.table(loc = c("L1","L2"), product = c("P1","P2","P3"), qty = runif(12))
DT2 <- data.table(product = c("P1","P2","P3"), family = c("A","A","B"), price = c(5,7,10))

DT1 %>% 
    left_join(DT2, by='product') %>% 
    mutate(val = qty*price) %>% 
    group_by(product, family) %>% 
    summarize(V1 = sum(val))

  product family    V1
  <chr>   <chr>  <dbl>
1 P1      A       10.9
2 P2      A       10.1
3 P3      B       22.8
Run Code Online (Sandbox Code Playgroud)

希望能帮助到你。

  • 如果他使用 data.table 方法,提供 dplyr 解决方案没有帮助。 (3认同)