我想在data.table连接中使用i表的列进行计算和分组.这种语法似乎有一些限制.你能建议一个更干净的方法吗?
require(data.table)
set.seed(1)
Run Code Online (Sandbox Code Playgroud)
表格1
DT1 <- data.table(loc = c("L1","L2"), product = c("P1","P2","P3"), qty = runif(12))
Run Code Online (Sandbox Code Playgroud)
表2
DT2 <- data.table(product = c("P1","P2","P3"), family = c("A","A","B"), price = c(5,7,10))
Run Code Online (Sandbox Code Playgroud)
表上的直接连接很好:[这里不是问题,但是在on子句中使用引用的列名称的要求似乎与data.table不一致]
DT1[DT2, on = "product"]
# loc product qty family price
# 1: L1 P1 0.1297134 A 5
# 2: L2 P1 0.2423550 A 5
# 3: L1 P1 0.3421633 A 5
# 4: L2 P1 0.6537663 A 5
# 5: L2 P2 0.9822407 A 7
# 6: L1 P2 0.8568853 A 7
# 7: L2 P2 0.7062672 A 7
# 8: L1 P2 0.9224086 A 7
# 9: L1 P3 0.8267184 B 10
#10: L2 P3 0.8408788 B 10
#11: L1 P3 0.6212432 B 10
#12: L2 P3 0.5363538 B 10
Run Code Online (Sandbox Code Playgroud)
使用两个表的列进行计算很好:
DT1[DT2, .(family, product, val = qty*price), on = "product"]
# family product val
# 1: A P1 0.6485671
# 2: A P1 1.2117750
# 3: A P1 1.7108164
# 4: A P1 3.2688313
# 5: A P2 6.8756851
# 6: A P2 5.9981971
# 7: A P2 4.9438704
# 8: A P2 6.4568599
# 9: B P3 8.2671841
#10: B P3 8.4087878
#11: B P3 6.2124323
#12: B P3 5.3635379
Run Code Online (Sandbox Code Playgroud)
我可以在.EACHI上进行分组和聚合
DT1[DT2,.(family, product, val = sum(qty*price)), on = "product", by = .EACHI]
# product family product val
#1: P1 A P1 6.83999
#2: P2 A P1 24.27461
#3: P3 B P1 28.25194
Run Code Online (Sandbox Code Playgroud)
但不使用产品
DT1[DT2,.(family, product, val = sum(qty*price)), on = "product", by = product]
#Error in `[.data.table`(DT1, DT2, .(family, product, val = sum(qty * price)), :
#object 'price' not found
Run Code Online (Sandbox Code Playgroud)
在这种情况下,它不再在i表上找到价格.
在这种情况下,.EACHI是可用的,因为by元素是DT2的唯一键.
但是,如果我想按DT2的属性进行分组,我似乎无法使用.EACHI引用.我通过以下方式实现了我想要的目标:
DT1[DT2, .(family, product, val = qty*price), on = "product"][, .(sum(val)), by = family]
# family V1
#1: A 31.11460
#2: B 28.25194
Run Code Online (Sandbox Code Playgroud)
这种双重处理是必要的还是我可以在这种情况下使用另一段语法?
Cés*_*ero -4
由于多种原因,我不会使用此过程来进行摘要。您可以使用dplyrorplyr轻松加入和总结。像这样:
require(data.table)
library(dplyr)
set.seed(1)
DT1 <- data.table(loc = c("L1","L2"), product = c("P1","P2","P3"), qty = runif(12))
DT2 <- data.table(product = c("P1","P2","P3"), family = c("A","A","B"), price = c(5,7,10))
DT1 %>%
left_join(DT2, by='product') %>%
mutate(val = qty*price) %>%
group_by(product, family) %>%
summarize(V1 = sum(val))
product family V1
<chr> <chr> <dbl>
1 P1 A 10.9
2 P2 A 10.1
3 P3 B 22.8
Run Code Online (Sandbox Code Playgroud)
希望能帮助到你。