通过R中的另一列和第三列的成对组合计算列的唯一值

sha*_*nas 5 r dataframe data.table

说实话,这是一项相当复杂的任务.它基本上是我之前提出的问题的扩展 - 通过R中另一列的成对组合计算列的唯一值

让我们说这次,我在R中有以下数据框:

data.frame(Reg.ID = c(1,1,2,2,2,3,3), Location = c("X","X","Y","Y","Y","X","X"), Product = c("A","B","A","B","C","B","A"))
Run Code Online (Sandbox Code Playgroud)

数据看起来像这样 -

      Reg.ID Location Product
1      1        X       A
2      1        X       B
3      2        Y       A
4      2        Y       B
5      2        Y       C
6      3        X       B
7      3        X       A
Run Code Online (Sandbox Code Playgroud)

我想通过"Product"列中的值的成对组合计算"Reg.ID"列的唯一值,按"Location"列分组.结果应该是这样的 -

  Location Prod.Comb Count
1        X       A,B     2
2        Y       A,B     1
3        Y       A,C     1
4        Y       B,C     1
Run Code Online (Sandbox Code Playgroud)

我尝试使用基本R函数获取输出,但没有取得任何成功.我猜data.table在R中使用包有一个相当简单的解决方案?

任何帮助将不胜感激.谢谢!

Rei*_*ica 6

没有太多经过考验的想法,但这是首先想到的data.table:

library(data.table)
dt <- data.table(Reg.ID = c(1,1,2,2,2,3,3), Location = c("X","X","Y","Y","Y","X","X"), Product = c("A","B","A","B","C","B","A"))
dt.cj <- merge(dt, dt, by ="Location", all = T, allow.cartesian = T)
dt.res <- dt.cj[Product.x < Product.y, .(cnt = length(unique(Reg.ID.x))),by = .(Location, Product.x, Product.y)]


#    Location Product.x Product.y cnt
# 1:        X         A         B  2
# 2:        Y         A         B  1
# 3:        Y         A         C  1
# 4:        Y         B         C  1
Run Code Online (Sandbox Code Playgroud)

  • 类似的方式:`dt [order(Product),CJ(Product,Product)[V1 <V2],by =.(Location,Reg.ID)] [,.N,by =.(Location,V1,V2)我认为``CJ`类似于你的笛卡尔联盟. (4认同)