jna*_*0ne 1 merge join r dplyr
我有两个数据框,我试图合并.第一个表是项目列表和相关数据,例如:
items <- data.frame(
item_code = c(1111, 2222, 3333, 4444),
item_category = c("cata","catb","catc","catd")
)
Run Code Online (Sandbox Code Playgroud)
第二个是交易清单:
transactions <- data.frame(
tran_code = c('aaaa', 'bbbb', 'cccc', 'dddd'),
tran_items = c("1111,1111,2222","3333,2222","1111,4444,4444","3333")
)
Run Code Online (Sandbox Code Playgroud)
我正在尝试创建一个列,在每个单元格中包含项目出现的事务列表,如下所示:
view(final_df)
item_code item_category in_trans
1111 "cata" "aaaa,cccc"
2222 "catb" "aaaa,bbbb"
3333 "catc" "bbbb,dddd"
4444 "catd" "cccc"
Run Code Online (Sandbox Code Playgroud)
谁能提供有关如何实现这一目标的建议?
使用splitstackshape和data.table包:
library(splitstackshape) # this will also load the 'data.table'-package
setDT(items)
setDT(transactions)
items[unique(cSplit(transactions, 'tran_items', ',', 'long')), on = .(item_code = tran_items),
][, .(in_trans = toString(tran_code)), by = .(item_code, item_category)]
Run Code Online (Sandbox Code Playgroud)
得到:
Run Code Online (Sandbox Code Playgroud)item_code item_category in_trans 1: 1111 cata aaaa, cccc 2: 2222 catb aaaa, bbbb 3: 3333 catc bbbb, dddd 4: 4444 catd cccc
使用tidyverse,您可以这样做:
library(dplyr)
library(tidyr)
items %>%
left_join(., transactions %>%
separate_rows(tran_items) %>%
distinct() %>%
group_by(tran_items = as.numeric(tran_items)) %>%
summarise(in_tran = toString(tran_code)),
by = c('item_code' = 'tran_items'))
Run Code Online (Sandbox Code Playgroud)