将数据从一张有条件的表中分到不同的表中

Question

将数据从一张有条件的表中分到不同的表中

我正在用 R 对几种产品进行交叉销售分析。我已经转换了交易数据，它看起来像这样 -

  df.articles <- cbind.data.frame(Art01,Art02,Art03)

  Art01         Art02      Art03
  bread         yoghurt    egg
  butter        bread      yoghurt
  cheese        butter     bread
  egg           cheese     NA
  potato        NA         NA

 Actual data is 'data.frame': 69099 obs. of  33 variables.

Run Code Online (Sandbox Code Playgroud)

我想要列出所有不同的物品及其与物品一起出售的数量（在这种情况下是面包或酸奶）实际数据由 56 件物品组成，我需要检查与它交叉销售的所有物品. 所以我想要的结果必须看起来像 -

     Products sold with **bread**           Products sold with **Yoghurt**  

     yoghurt         2                        bread   2
     egg             1                        egg     1
     cheese          1                       butter   1
     butter          1          

     .... and list goes on like this for say 52 different articles.

Run Code Online (Sandbox Code Playgroud)

我已经尝试了几件事，但对于这个大数据集来说它太手动了。借助 library(data.table) 解决这个问题会很棒，如果没有，那也很好。非常感谢您提前。

Answer 1

Fra*_*ank 5

有...

library(data.table)
setDT(DF)
dat = setorder(melt(DF[, r := .I], id="r", na.rm=TRUE)[, !"variable"])
res = dat[, CJ(art = value, other_art = value), by=r][art != other_art, .N, keyby=.(art, other_art)]

        art other_art N
 1:   bread    butter 2
 2:   bread    cheese 1
 3:   bread       egg 1
 4:   bread   yoghurt 2
 5:  butter     bread 2
 6:  butter    cheese 1
 7:  butter   yoghurt 1
 8:  cheese     bread 1
 9:  cheese    butter 1
10:  cheese       egg 1
11:     egg     bread 1
12:     egg    cheese 1
13:     egg   yoghurt 1
14: yoghurt     bread 2
15: yoghurt    butter 1
16: yoghurt       egg 1

Run Code Online (Sandbox Code Playgroud)

评论。OP 提到有 56 个不同的项目，这意味着单个订单（r上面）在CJ. 有了几千个订单，这很快就会成为问题。这在进行组合计算时很常见，因此希望此任务仅用于浏览数据而不是分析数据。

浏览时的另一个想法是使用split和lapply自定义显示：

library(magrittr)
split(res, by="art", keep.by = FALSE) %>% lapply(. %$% setNames(N, other_art))

$bread
 butter  cheese     egg yoghurt 
      2       1       1       2 

$butter
  bread  cheese yoghurt 
      2       1       1 

$cheese
 bread butter    egg 
     1      1      1 

$egg
  bread  cheese yoghurt 
      1       1       1 

$yoghurt
 bread butter    egg 
     2      1      1

Run Code Online (Sandbox Code Playgroud)

不过，正如@ycw 在评论中建议的那样res[art == "bread"]，我通常只是使用,res[art == "bread" & other_art == "butter"]等进行探索。

这里并不真正需要 Magrittr；它只是允许不同的语法。

归档时间：	8 年，3 月前
查看次数：	77 次
最近记录：	8 年，3 月前