McR*_*oon 2 grouping r dplyr tidyverse
我的数据集的简化版本可以通过以下方式复制:
df <- data.frame(buyer = c("A","C","B"),
                 seller = c("B","D","E"),
                 amount = c(1,2,3))
Run Code Online (Sandbox Code Playgroud)
我正在寻找一个优选的dplyr解决方案来实现以下目标.
buyer          seller       amount
  A              B           1
  C              D           2
  B              E           3
Run Code Online (Sandbox Code Playgroud)
应该为每个代理(A,B,C,D,E)生成分组摘要
output
agent     total_amount
  A        1
  B        4 #(=1+3)
  C        2
  D        2
Run Code Online (Sandbox Code Playgroud)
我可以group_by买家和卖家,然后添加结果,但这不优雅,有点麻烦.
library(dplyr)
res_b <- df %>%
      group_by(buyer) %>%
      summarise(total_amount=sum(amount))
res_s <- df %>%
      group_by(seller) %>%
      summarise(total_amount=sum(amount))
Run Code Online (Sandbox Code Playgroud)
任何帮助表示赞赏.其他解决方案(不是整齐的)显然也是受欢迎的.
编辑:应该说我的原始数据集大约为6000万观察.
我们可以先转换为长格式并进行简单的聚合,即
library(tidyverse)
df %>% 
 gather(var, agent, -amount) %>% 
 group_by(agent) %>% 
 summarise(total_amount = sum(amount))
Run Code Online (Sandbox Code Playgroud)
这使,
Run Code Online (Sandbox Code Playgroud)# A tibble: 5 x 2 agent total_amount <chr> <dbl> 1 A 1 2 B 4 3 C 2 4 D 2 5 E 3
您可以尝试data.table提高效率.这是tidyverse上面代码的直接翻译,
library(data.table)
dt1 <- setDT(df)
melt(dt1, measure.vars = c('buyer', 'seller'), id.vars = 'amount', value.name = "agent"
     )[, .(total_amount = sum(amount)), by = agent][]
#   agent total_amount
#1:     A            1
#2:     C            2
#3:     B            4
#4:     D            2
#5:     E            3
Run Code Online (Sandbox Code Playgroud)