对订单详细信息进行购物篮分析

6 sql market-basket-analysis

我有一张看起来(缩写)的表格:

| order_id  | item_id   | amount    | qty   | date          |
|---------- |---------  |--------   |-----  |------------   |
| 1         | 1         | 10        | 1     | 10-10-2014    |
| 1         | 2         | 20        | 2     | 10-10-2014    |
| 2         | 1         | 10        | 1     | 10-12-2014    |
| 2         | 2         | 20        | 1     | 10-12-2014    |
| 2         | 3         | 45        | 1     | 10-12-2014    |
| 3         | 1         | 10        | 1     | 9-9-2014      |
| 3         | 3         | 45        | 1     | 9-9-2014      |
| 4         | 2         | 20        | 1     | 11-11-2014    |
Run Code Online (Sandbox Code Playgroud)

我想运行一个查询来计算最常一起出现的项目列表。

在这种情况下,结果将是:

|items|frequency|
|-----|---------|
|1,2, |2        |
|1,3  |1        |
|2,3  |1        |
|2    |1        |
Run Code Online (Sandbox Code Playgroud)

理想情况下,首先显示包含多个商品的订单,然后显示订购频率最高的单个商品。

任何人都可以提供一个如何构建此 SQL 的示例吗?

Pab*_*rre 3

在 2 个项目同时出现的情况下,此查询会生成所有请求的输出。它不包括请求输出的最后一项,因为从技术上讲,单个值 (2) 不会与任何内容一起出现...尽管您可以轻松添加 UNION 查询以包括单独出现的值。

这是为 PostgreSQL 9.3 编写的

 create table orders(
        order_id int, 
        item_id int, 
        amount int, 
        qty int, 
        date timestamp


);

INSERT INTO ORDERS VALUES(1,1,10,1,'10-10-2014');
INSERT INTO ORDERS VALUES(1,2,20,1,'10-10-2014');
INSERT INTO ORDERS VALUES(2,1,10,1,'10-12-2014');
INSERT INTO ORDERS VALUES(2,2,20,1,'10-12-2014');
INSERT INTO ORDERS VALUES(2,3,45,1,'10-12-2014');
INSERT INTO ORDERS VALUES(3,1,10,1,'9-9-2014');
INSERT INTO ORDERS VALUES(3,3,45,1,'9-9-2014');
INSERT INTO ORDERS VALUES(4,2,10,1,'11-11-2014');

with order_pairs as (
    select (pg1.item_id, pg2.item_id) as items, pg1.date
    from 
    (select distinct item_id, date
    from orders) as pg1
    join
    (select distinct item_id, date
    from orders) as pg2
    ON 
    (
    pg1.date = pg2.date AND
    pg1.item_id != pg2.item_id AND
    pg1.item_id < pg2.item_id

    )
    )

    SELECT items, count(*) as frequency
    FROM order_pairs
    GROUP by items
    ORDER by items;
Run Code Online (Sandbox Code Playgroud)

输出

 items | frequency 
-------+-----------
 (1,2) |         2
 (1,3) |         2
 (2,3) |         1
(3 rows)
Run Code Online (Sandbox Code Playgroud)