如何通过前N个类别与"所有其他"和总计进行汇总?

Bro*_*ams 6 sql t-sql sql-server sql-server-2017

我有表格按类别列出用户的销售额(每个销售至少有一个,可能有几个类别).

我能得到的最高类别的用户,但我需要的用户统计数据他/她的前N类和余数.

我将问题归结为MCVE,如下所示......

MCVE 数据摘要:

Salesman    SaleID    Amount    Categories
--------    ------    ------    ------------------------------
     1         1         2      Service
     2         2         2      Software, Support_Contract
     2         3         3      Service
     2         4         1      Parts, Service, Software
     2         5         3      Support_Contract
     2         6         4      Promo_Gift, Support_Contract
     2         7        -2      Rebate, Support_Contract
     3         8         2      Software, Support_Contract
     3         9         3      Service
     3        10         1      Parts, Software
     3        11         3      Support_Contract
     3        12         4      Promo_Gift, Support_Contract
     3        13        -2      Rebate, Support_Contract

MCVE设置SQL:

CREATE TABLE Sales      ([Salesman] int, [SaleID] int, [Amount] int);
CREATE TABLE SalesTags  ([SaleID] int, [TagId] int);
CREATE TABLE Tags       ([TagId] int, [TagName] varchar(100) );

INSERT INTO Sales
    ([Salesman], [SaleID], [Amount])
VALUES
    (1, 1, 2),        (2, 6, 4),        (3, 10, 1),
    (2, 2, 2),        (2, 7, -2),       (3, 11, 3),
    (2, 3, 3),        (3, 8, 2),        (3, 12, 4),
    (2, 4, 1),        (3, 9, 3),        (3, 13, -2),
    (2, 5, 3)
;
INSERT INTO SalesTags
    ([SaleID], [TagId])
VALUES
    (1, 3),           (6, 4),           (10, 1),
    (2, 1),           (6, 5),           (10, 2),
    (2, 4),           (7, 4),           (11, 4),
    (3, 3),           (7, 6),           (12, 4),
    (4, 1),           (8, 1),           (12, 5),
    (4, 2),           (8, 4),           (13, 4),
    (4, 3),           (9, 3),           (13, 6),
    (5, 4)
;
INSERT INTO Tags
    ([TagId], [TagName])
VALUES
    (1, 'Software'),
    (2, 'Parts'),
    (3, 'Service'),
    (4, 'Support_Contract'),
    (5, 'Promo_Gift'),
    (6, 'Rebate')
;
Run Code Online (Sandbox Code Playgroud)


看到这个SQL小提琴,我可以得到用户的前N个标签,如:

WITH usersSales AS (  -- actual base CTE is much more complex
    SELECT  s.SaleID
            , s.Amount
    FROM    Sales s
    WHERE   s.Salesman = 2
)
SELECT Top 3  -- N can be 3 to 10
            t.TagName
            , COUNT (us.SaleID)     AS tagSales
            , SUM (us.Amount)       AS tagAmount
FROM        usersSales us
INNER JOIN  SalesTags st    ON st.SaleID = us.SaleID
INNER JOIN  Tags t          ON t.TagId   = st.TagId
GROUP BY    t.TagName
ORDER BY    tagAmount DESC
            , tagSales DESC
            , t.TagName
Run Code Online (Sandbox Code Playgroud)

- 显示用户的最高类别:

  1. "Support_Contract"
  2. "服务"
  3. "Promo_Gift"

按此顺序,对于用户2.(和Support_Contract,Promo_Gift,用户3的软件.)

但是,对于N = 3,需要结果是:

  • 用户2:

    Top Category        Amount    Number of Sales
    ----------------    ------    ---------------
    Support Contract       7             4
    Service                4             2
    Promo Gift             0             0
    - All Others -         0             0
    ============================================
    Totals                11             6
    
    Run Code Online (Sandbox Code Playgroud)
  • 用户3:

    Top Category        Amount    Number of Sales
    ----------------    ------    ---------------
    Support Contract       7             4
    Promo_Gift             0             0
    Software               1             1
    - All Others -         3             1
    ============================================
    Totals                11             6
    
    Run Code Online (Sandbox Code Playgroud)

哪里:

  1. 对于给定的销售,热门类别是用户排名最高的类别(根据上面的查询).
  2. 第2行的最高类别不包括已在第1行中占据的销售额.
  3. 第3行的最高类别不包括已在第1行和第2行中占据的销售额.
  4. 等等.
  5. 所有剩余的销售额,未计入前N个类别,都归入- All Others -集团.
  6. 底部的总数与用户的总体销售数字相符.

我如何汇总这样的结果?

请注意,这是在MS SQL-Server 2017上运行的,我无法更改表架构.

Vla*_*nov 5

这是一种方法。逐步、逐个 CTE 运行查询并检查中间结果以了解其工作原理。

这不是最有效的方法,因为我最终将表连接到自身以消除之前汇总的销售额,但我目前不知道如何避免它。

WITH usersSales 
AS 
(  -- actual base CTE is much more complex
    SELECT
        s.SaleID
        , s.Amount
    FROM Sales s
    WHERE s.Salesman = 2
)
,CTE_Sums
AS
(
    SELECT
        t.TagName
        ,us.Amount
        ,us.SaleID
        ,SUM(us.Amount) OVER (PARTITION BY t.TagName) AS TagAmount
        ,COUNT(*) OVER (PARTITION BY t.TagName) AS TagSales
    FROM
        usersSales us
        INNER JOIN SalesTags st ON st.SaleID = us.SaleID
        INNER JOIN Tags t ON t.TagId = st.TagId
)
,CTE_Rank
AS
(
    SELECT
        TagName
        ,Amount
        ,SaleID
        ,TagAmount
        ,TagSales
        ,DENSE_RANK() OVER (ORDER BY TagAmount DESC, TagSales DESC, TagName) AS rnk
    FROM CTE_Sums
)
,CTE_Final
AS
(
    SELECT
        Main.TagName
        ,Main.Amount
        ,Main.SaleID
        ,Main.TagAmount
        ,Main.TagSales
        ,Main.rnk
        ,ISNULL(A.FinalTagAmount, 0) AS FinalTagAmount
        ,A.FinalTagSales
    FROM
        CTE_Rank AS Main
        OUTER APPLY
        (
            SELECT
                SUM(Detail.Amount) AS FinalTagAmount
                ,COUNT(*) AS FinalTagSales
            FROM CTE_Rank AS Detail
            WHERE
                Detail.rnk = Main.rnk
                AND Detail.SaleID NOT IN
                (
                    SELECT PrevRanks.SaleID
                    FROM CTE_Rank AS PrevRanks
                    WHERE PrevRanks.rnk < Detail.rnk
                )
        ) AS A
)
SELECT
    TagName
    ,MIN(FinalTagAmount) AS FinalTagAmount
    ,MIN(FinalTagSales) AS FinalTagSales
    ,rnk
    ,0 AS SortOrder
FROM CTE_Final
WHERE rnk <= 3
GROUP BY
    TagName
    ,rnk

UNION ALL

SELECT
    '- All Others -' AS TagName
    ,SUM(FinalTagAmount) AS FinalTagAmount
    ,SUM(FinalTagSales) AS FinalTagSales
    ,0 AS rnk
    ,1 AS SortOrder
FROM CTE_Final
WHERE rnk > 3

ORDER BY
    SortOrder
    ,rnk
;
Run Code Online (Sandbox Code Playgroud)

CTE_Rank

暂时不要对行进行分组和求和,而是使用窗口聚合来获取每个标签的排名。稍后我们将需要SaleID具有单独金额的单独行 ( ) 来过滤正在使用的行。

+------------------+--------+--------+-----------+----------+-----+
|     TagName      | Amount | SaleID | TagAmount | TagSales | rnk |
+------------------+--------+--------+-----------+----------+-----+
| Support Contract |     -2 |      7 |         7 |        4 |   1 |
| Support Contract |      3 |      5 |         7 |        4 |   1 |
| Support Contract |      4 |      6 |         7 |        4 |   1 |
| Support Contract |      2 |      2 |         7 |        4 |   1 |
| Service          |      1 |      4 |         4 |        2 |   2 |
| Service          |      3 |      3 |         4 |        2 |   2 |
| Promo Gift       |      4 |      6 |         4 |        1 |   3 |
| Software         |      1 |      4 |         3 |        2 |   4 |
| Software         |      2 |      2 |         3 |        2 |   4 |
| Parts            |      1 |      4 |         1 |        1 |   5 |
| Rebate           |     -2 |      7 |        -2 |        1 |   6 |
+------------------+--------+--------+-----------+----------+-----+
Run Code Online (Sandbox Code Playgroud)

CTE_最终

OUTER APPLY通过过滤排名较高的标签中遇到的那些销售来进行主要计算。

+------------------+--------+--------+-----------+----------+-----+----------------+---------------+
|     TagName      | Amount | SaleID | TagAmount | TagSales | rnk | FinalTagAmount | FinalTagSales |
+------------------+--------+--------+-----------+----------+-----+----------------+---------------+
| Support Contract |     -2 |      7 |         7 |        4 |   1 |              7 |             4 |
| Support Contract |      3 |      5 |         7 |        4 |   1 |              7 |             4 |
| Support Contract |      4 |      6 |         7 |        4 |   1 |              7 |             4 |
| Support Contract |      2 |      2 |         7 |        4 |   1 |              7 |             4 |
| Service          |      1 |      4 |         4 |        2 |   2 |              4 |             2 |
| Service          |      3 |      3 |         4 |        2 |   2 |              4 |             2 |
| Promo Gift       |      4 |      6 |         4 |        1 |   3 |              0 |             0 |
| Software         |      1 |      4 |         3 |        2 |   4 |              0 |             0 |
| Software         |      2 |      2 |         3 |        2 |   4 |              0 |             0 |
| Parts            |      1 |      4 |         1 |        1 |   5 |              0 |             0 |
| Rebate           |     -2 |      7 |        -2 |        1 |   6 |              0 |             0 |
+------------------+--------+--------+-----------+----------+-----+----------------+---------------+
Run Code Online (Sandbox Code Playgroud)

查询结果

只需将排名前 3 的标签加上所有其他标签放在一起即可。

+------------------+----------------+---------------+-----+-----------+
|     TagName      | FinalTagAmount | FinalTagSales | rnk | SortOrder |
+------------------+----------------+---------------+-----+-----------+
| Support Contract |              7 |             4 |   1 |         0 |
| Service          |              4 |             2 |   2 |         0 |
| Promo Gift       |              0 |             0 |   3 |         0 |
| - All Others -   |              0 |             0 |   0 |         1 |
+------------------+----------------+---------------+-----+-----------+
Run Code Online (Sandbox Code Playgroud)