SQL查询产品的频率分布矩阵

cod*_*r25 9 sql hive hiveql apache-spark

我想创建一个频率分布矩阵

1.Create a matrix.**Is it possible to get this in separate columns**

  customer1       p1         p2      p3
  customer 2      p2         p3
  customer 3      p2         p3      p1
  customer 4      p2         p1

2. Then I have to count the number of products that come together the most

   For eg  
    p2 and p3 comes together 3 times
    p1 p3   comes 2 times
    p1 p2  comes  2 times

I want to recommend products to customers ,frequency of products that comes together

 select customerId,product,count(*) from sales group by customerId,product
Run Code Online (Sandbox Code Playgroud)

任何人都可以帮我解决这个问题

Gor*_*off 7

如果您想要客户购买的产品对,那么您可以使用自我加入:

select s1.product, s2.product, count(*) as cnt
from sales s1 join
     sales s2
     on s1.customerId = s2.customerId
where s1.product < s2.product
group by s1.product, s2.product
order by cnt desc;
Run Code Online (Sandbox Code Playgroud)

您可以使用更多连接将其扩展到两个以上的产品.