che*_*gcj 6 group-by r data.table
我有以下内容data.table:
> dt = data.table(sales_ccy = c("USD", "EUR", "GBP", "USD"), sales_amt = c(500,600,700,800), cost_ccy = c("GBP","USD","GBP","USD"), cost_amt = c(-100,-200,-300,-400))
> dt
sales_ccy sales_amt cost_ccy cost_amt
1: USD 500 GBP -100
2: EUR 600 USD -200
3: GBP 700 GBP -300
4: USD 800 USD -400
Run Code Online (Sandbox Code Playgroud)
我的目标是获得以下内容data.table:
> dt
ccy total_amt
1: EUR 600
2: GBP 300
3: USD 700
Run Code Online (Sandbox Code Playgroud)
基本上,我想按货币汇总所有成本和销售额.实际上,这data.table有> 500,000行,所以我想要一种快速有效的方法来将总和相加.
想快速做到这一点的想法吗?
使用data.table v1.9.6+,其改进版本melt可以同时融入多个列,
require(data.table) # v1.9.6+
melt(dt, measure = patterns("_ccy$", "_amt$")
)[, .(tot_amt = sum(value2)), keyby = .(ccy=value1)]
Run Code Online (Sandbox Code Playgroud)
您可以merged.stack从我的"splitstackshape"包中考虑.
在这里,我也使用"dplyr"进行滚边,但如果您愿意,可以跳过它.
library(dplyr)
library(splitstackshape)
dt %>%
mutate(id = 1:nrow(dt)) %>%
merged.stack(var.stub = c("ccy", "amt"), sep = "var.stubs", atStart = FALSE) %>%
.[, .(total_amt = sum(amt)), by = ccy]
# ccy total_amt
# 1: GBP 300
# 2: USD 700
# 3: EUR 600
Run Code Online (Sandbox Code Playgroud)
"data.table"的开发版本应该能够处理熔化的列组.它也比merged.stack.
肮脏但有效
# Bind costs and sales
df <- rbind(df[,list(ccy = cost_ccy, total_amt = cost_amt)],
df[,list(ccy = sales_ccy, total_amt = sales_amt)])
# Sum for every currency
df[, sum(total_amt), by = ccy]
ccy V1
1: GBP 300
2: USD 700
3: EUR 600
Run Code Online (Sandbox Code Playgroud)