根据事务中的项目存在加入数据框

jna*_*0ne 1 merge join r dplyr

我有两个数据框,我试图合并.第一个表是项目列表和相关数据,例如:

items <- data.frame(
  item_code = c(1111, 2222, 3333, 4444),
  item_category = c("cata","catb","catc","catd")
  )
Run Code Online (Sandbox Code Playgroud)

第二个是交易清单:

transactions <- data.frame(
  tran_code = c('aaaa', 'bbbb', 'cccc', 'dddd'),
  tran_items = c("1111,1111,2222","3333,2222","1111,4444,4444","3333")
  )
Run Code Online (Sandbox Code Playgroud)

我正在尝试创建一个列,在每个单元格中包含项目出现的事务列表,如下所示:

view(final_df)

item_code item_category in_trans
1111      "cata"        "aaaa,cccc"
2222      "catb"        "aaaa,bbbb"
3333      "catc"        "bbbb,dddd"
4444      "catd"        "cccc"
Run Code Online (Sandbox Code Playgroud)

谁能提供有关如何实现这一目标的建议?

Jaa*_*aap 5

使用splitstackshapedata.table包:

library(splitstackshape) # this will also load the 'data.table'-package

setDT(items)
setDT(transactions)

items[unique(cSplit(transactions, 'tran_items', ',', 'long')), on = .(item_code = tran_items),
      ][, .(in_trans = toString(tran_code)), by = .(item_code, item_category)]
Run Code Online (Sandbox Code Playgroud)

得到:

   item_code item_category   in_trans
1:      1111          cata aaaa, cccc
2:      2222          catb aaaa, bbbb
3:      3333          catc bbbb, dddd
4:      4444          catd       cccc
Run Code Online (Sandbox Code Playgroud)

使用tidyverse,您可以这样做:

library(dplyr)
library(tidyr)

items %>% 
  left_join(., transactions %>% 
              separate_rows(tran_items) %>% 
              distinct() %>% 
              group_by(tran_items = as.numeric(tran_items)) %>% 
              summarise(in_tran = toString(tran_code)),
            by = c('item_code' = 'tran_items'))
Run Code Online (Sandbox Code Playgroud)