R data.table group由多列组成1列和求和

che*_*gcj 6 group-by r data.table

我有以下内容data.table:

> dt = data.table(sales_ccy = c("USD", "EUR", "GBP", "USD"), sales_amt = c(500,600,700,800), cost_ccy = c("GBP","USD","GBP","USD"), cost_amt = c(-100,-200,-300,-400))
> dt
   sales_ccy sales_amt cost_ccy cost_amt
1:       USD       500      GBP     -100
2:       EUR       600      USD     -200
3:       GBP       700      GBP     -300
4:       USD       800      USD     -400
Run Code Online (Sandbox Code Playgroud)

我的目标是获得以下内容data.table:

> dt
   ccy total_amt
1: EUR       600
2: GBP       300
3: USD       700
Run Code Online (Sandbox Code Playgroud)

基本上,我想按货币汇总所有成本和销售额.实际上,这data.table有> 500,000行,所以我想要一种快速有效的方法来将总和相加.

想快速做到这一点的想法吗?

Aru*_*run 9

使用data.table v1.9.6+,其改进版本melt可以同时融入多个列,

require(data.table) # v1.9.6+
melt(dt, measure = patterns("_ccy$", "_amt$")
    )[, .(tot_amt = sum(value2)), keyby = .(ccy=value1)]
Run Code Online (Sandbox Code Playgroud)


A5C*_*2T1 7

您可以merged.stack从我的"splitstackshape"包中考虑.

在这里,我也使用"dplyr"进行滚边,但如果您愿意,可以跳过它.

library(dplyr)
library(splitstackshape)

dt %>%
  mutate(id = 1:nrow(dt)) %>%
  merged.stack(var.stub = c("ccy", "amt"), sep = "var.stubs", atStart = FALSE) %>%
  .[, .(total_amt = sum(amt)), by = ccy]
#    ccy total_amt
# 1: GBP       300
# 2: USD       700
# 3: EUR       600
Run Code Online (Sandbox Code Playgroud)

"data.table"的开发版本应该能够处理熔化的列组.它也比merged.stack.


PoG*_*bas 2

肮脏但有效

# Bind costs and sales
df <- rbind(df[,list(ccy = cost_ccy, total_amt = cost_amt)], 
            df[,list(ccy = sales_ccy, total_amt = sales_amt)])
# Sum for every currency
df[, sum(total_amt), by = ccy]
   ccy  V1
1: GBP 300
2: USD 700
3: EUR 600
Run Code Online (Sandbox Code Playgroud)