cod*_*r25 9 sql hive hiveql apache-spark
我想创建一个频率分布矩阵
1.Create a matrix.**Is it possible to get this in separate columns**
customer1 p1 p2 p3
customer 2 p2 p3
customer 3 p2 p3 p1
customer 4 p2 p1
2. Then I have to count the number of products that come together the most
For eg
p2 and p3 comes together 3 times
p1 p3 comes 2 times
p1 p2 comes 2 times
I want to recommend products to customers ,frequency of products that comes together
select customerId,product,count(*) from sales group by customerId,product
Run Code Online (Sandbox Code Playgroud)
任何人都可以帮我解决这个问题
如果您想要客户购买的产品对,那么您可以使用自我加入:
select s1.product, s2.product, count(*) as cnt
from sales s1 join
sales s2
on s1.customerId = s2.customerId
where s1.product < s2.product
group by s1.product, s2.product
order by cnt desc;
Run Code Online (Sandbox Code Playgroud)
您可以使用更多连接将其扩展到两个以上的产品.
| 归档时间: |
|
| 查看次数: |
397 次 |
| 最近记录: |