选择超过总值百分比的行子集

ben*_*ear 7 sql t-sql sql-server running-total sql-server-2008

我有一张桌子,其客户,用户和收入类似于以下(实际上有数千条记录):

Customer   User    Revenue
001        James   500
002        James   750
003        James   450
004        Sarah   100
005        Sarah   500
006        Sarah   150
007        Sarah   600
008        James   150
009        James   100
Run Code Online (Sandbox Code Playgroud)

我想要做的只是返回占用户总收入80%的最高消费客户.

要手动执行此操作,我会根据收入对James的客户进行排序,计算总计百分比和运行总百分比,然后仅返回记录,直到达到运行总计达到80%:

Customer    User    Revenue     % of total  Running Total %
002         James   750         0.38        0.38 
001         James   500         0.26        0.64 
003         James   450         0.23        0.87  <- Greater than 80%, last record
008         James   150         0.08        0.95 
009         James   100         0.05        1.00 
Run Code Online (Sandbox Code Playgroud)

我尝试过使用CTE但到目前为止已经空白了.有没有办法通过单个查询而不是在Excel工作表中手动执行此操作?

Luk*_*zda 7

SQL Server 2012+ 只要

你可以使用窗口SUM:

WITH cte AS
(
   SELECT *,
          1.0 * Revenue/SUM(Revenue) OVER(PARTITION BY [User]) AS percentile,
          1.0 * SUM(Revenue) OVER(PARTITION BY [User] ORDER BY [Revenue] DESC)
                /SUM(Revenue) OVER(PARTITION BY [User]) AS running_percentile
   FROM tab
)
SELECT *
FROM cte 
WHERE running_percentile <= 0.8;
Run Code Online (Sandbox Code Playgroud)

LiveDemo


SQL Server 2008:

WITH cte AS
(
    SELECT *, ROW_NUMBER() OVER(PARTITION BY [User] ORDER BY Revenue DESC) AS rn
    FROM t    
), cte2 AS
(
    SELECT c.Customer, c.[User], c.[Revenue]
           ,percentile         = 1.0 * Revenue / NULLIF(c3.s,0)
           ,running_percentile = 1.0 * c2.s    / NULLIF(c3.s,0)
    FROM cte c
    CROSS APPLY
         (SELECT SUM(Revenue) AS s
          FROM cte c2
          WHERE c.[User] = c2.[User]
            AND c2.rn <= c.rn) c2
    CROSS APPLY
         (SELECT SUM(Revenue) AS s
          FROM cte c2
          WHERE c.[User] = c2.[User]) AS c3
) 
SELECT *
FROM cte2
WHERE running_percentile <= 0.8;
Run Code Online (Sandbox Code Playgroud)

LiveDemo2

输出:

????????????????????????????????????????????????????????????????????
? Customer ? User  ? Revenue ?   percentile   ? running_percentile ?
????????????????????????????????????????????????????????????????????
?        2 ? James ?     750 ? 0,384615384615 ? 0,384615384615     ?
?        1 ? James ?     500 ? 0,256410256410 ? 0,641025641025     ?
?        7 ? Sarah ?     600 ? 0,444444444444 ? 0,444444444444     ?
????????????????????????????????????????????????????????????????????
Run Code Online (Sandbox Code Playgroud)

编辑2:

看起来几乎就在那里,唯一的麻烦是它缺少了最后一排,詹姆斯的第三排让他超过了0.80,但需要被包括在内.

WITH cte AS
(
    SELECT *, ROW_NUMBER() OVER(PARTITION BY [User] ORDER BY Revenue DESC) AS rn
    FROM t    
), cte2 AS
(
    SELECT c.Customer, c.[User], c.[Revenue]
           ,percentile         = 1.0 * Revenue / NULLIF(c3.s,0)
           ,running_percentile = 1.0 * c2.s    / NULLIF(c3.s,0)
    FROM cte c
    CROSS APPLY
         (SELECT SUM(Revenue) AS s
          FROM cte c2
          WHERE c.[User] = c2.[User]
            AND c2.rn <= c.rn) c2
    CROSS APPLY
         (SELECT SUM(Revenue) AS s
          FROM cte c2
          WHERE c.[User] = c2.[User]) AS c3
) 
SELECT a.*
FROM cte2 a
CROSS APPLY (SELECT MIN(running_percentile) AS rp
             FROM cte2
             WHERE running_percentile >= 0.8
               AND cte2.[User] = a.[User]) AS s
WHERE a.running_percentile <= s.rp;
Run Code Online (Sandbox Code Playgroud)

LiveDemo3

输出:

????????????????????????????????????????????????????????????????????
? Customer ? User  ? Revenue ?   percentile   ? running_percentile ?
????????????????????????????????????????????????????????????????????
?        2 ? James ?     750 ? 0,384615384615 ? 0,384615384615     ?
?        1 ? James ?     500 ? 0,256410256410 ? 0,641025641025     ?
?        3 ? James ?     450 ? 0,230769230769 ? 0,871794871794     ?
?        7 ? Sarah ?     600 ? 0,444444444444 ? 0,444444444444     ?
?        5 ? Sarah ?     500 ? 0,370370370370 ? 0,814814814814     ?
????????????????????????????????????????????????????????????????????
Run Code Online (Sandbox Code Playgroud)

看起来很完美,翻译成我的大桌子并返回我需要的东西,花了5分钟完成它仍然无法完成你所做的!

SQL Server 2008不支持OVER()条款中的所有内容,但ROW_NUMBER确实如此.

首先cte只计算一个组内的位置:

??????????????????????????????????????
? Customer  ? User   ? Revenue  ? rn ?
??????????????????????????????????????
?        2  ? James  ?     750  ?  1 ?
?        1  ? James  ?     500  ?  2 ?
?        3  ? James  ?     450  ?  3 ?
?        8  ? James  ?     150  ?  4 ?
?        9  ? James  ?     100  ?  5 ?
?        7  ? Sarah  ?     600  ?  1 ?
?        5  ? Sarah  ?     500  ?  2 ?
?        6  ? Sarah  ?     150  ?  3 ?
?        4  ? Sarah  ?     100  ?  4 ?
??????????????????????????????????????
Run Code Online (Sandbox Code Playgroud)

第二个cte:

  • c2 子查询根据排名计算运行总计 ROW_NUMBER
  • c3 计算每个用户的全额

在最终查询s子查询中,查找running超过80%的最低总数.

编辑3:

使用ROW_NUMBER实际上是多余的.

WITH cte AS
(
    SELECT c.Customer, c.[User], c.[Revenue]
           ,percentile         = 1.0 * Revenue / NULLIF(c3.s,0)
           ,running_percentile = 1.0 * c2.s    / NULLIF(c3.s,0)
    FROM t c
    CROSS APPLY
         (SELECT SUM(Revenue) AS s
          FROM t c2
          WHERE c.[User] = c2.[User]
            AND c2.Revenue >= c.Revenue) c2
    CROSS APPLY
         (SELECT SUM(Revenue) AS s
          FROM t c2
          WHERE c.[User] = c2.[User]) AS c3
) 
SELECT a.*
FROM cte a
CROSS APPLY (SELECT MIN(running_percentile) AS rp
             FROM cte c2
             WHERE running_percentile >= 0.8
               AND c2.[User] = a.[User]) AS s
WHERE a.running_percentile <= s.rp
ORDER BY [User], Revenue DESC;
Run Code Online (Sandbox Code Playgroud)

LiveDemo4

  • @bendataclear请参阅更新 (2认同)