ben*_*ear 7 sql t-sql sql-server running-total sql-server-2008
我有一张桌子,其客户,用户和收入类似于以下(实际上有数千条记录):
Customer User Revenue
001 James 500
002 James 750
003 James 450
004 Sarah 100
005 Sarah 500
006 Sarah 150
007 Sarah 600
008 James 150
009 James 100
Run Code Online (Sandbox Code Playgroud)
我想要做的只是返回占用户总收入80%的最高消费客户.
要手动执行此操作,我会根据收入对James的客户进行排序,计算总计百分比和运行总百分比,然后仅返回记录,直到达到运行总计达到80%:
Customer User Revenue % of total Running Total %
002 James 750 0.38 0.38
001 James 500 0.26 0.64
003 James 450 0.23 0.87 <- Greater than 80%, last record
008 James 150 0.08 0.95
009 James 100 0.05 1.00
Run Code Online (Sandbox Code Playgroud)
我尝试过使用CTE但到目前为止已经空白了.有没有办法通过单个查询而不是在Excel工作表中手动执行此操作?
SQL Server 2012+ 只要
你可以使用窗口SUM:
WITH cte AS
(
SELECT *,
1.0 * Revenue/SUM(Revenue) OVER(PARTITION BY [User]) AS percentile,
1.0 * SUM(Revenue) OVER(PARTITION BY [User] ORDER BY [Revenue] DESC)
/SUM(Revenue) OVER(PARTITION BY [User]) AS running_percentile
FROM tab
)
SELECT *
FROM cte
WHERE running_percentile <= 0.8;
Run Code Online (Sandbox Code Playgroud)
SQL Server 2008:
WITH cte AS
(
SELECT *, ROW_NUMBER() OVER(PARTITION BY [User] ORDER BY Revenue DESC) AS rn
FROM t
), cte2 AS
(
SELECT c.Customer, c.[User], c.[Revenue]
,percentile = 1.0 * Revenue / NULLIF(c3.s,0)
,running_percentile = 1.0 * c2.s / NULLIF(c3.s,0)
FROM cte c
CROSS APPLY
(SELECT SUM(Revenue) AS s
FROM cte c2
WHERE c.[User] = c2.[User]
AND c2.rn <= c.rn) c2
CROSS APPLY
(SELECT SUM(Revenue) AS s
FROM cte c2
WHERE c.[User] = c2.[User]) AS c3
)
SELECT *
FROM cte2
WHERE running_percentile <= 0.8;
Run Code Online (Sandbox Code Playgroud)
输出:
????????????????????????????????????????????????????????????????????
? Customer ? User ? Revenue ? percentile ? running_percentile ?
????????????????????????????????????????????????????????????????????
? 2 ? James ? 750 ? 0,384615384615 ? 0,384615384615 ?
? 1 ? James ? 500 ? 0,256410256410 ? 0,641025641025 ?
? 7 ? Sarah ? 600 ? 0,444444444444 ? 0,444444444444 ?
????????????????????????????????????????????????????????????????????
Run Code Online (Sandbox Code Playgroud)
编辑2:
看起来几乎就在那里,唯一的麻烦是它缺少了最后一排,詹姆斯的第三排让他超过了0.80,但需要被包括在内.
WITH cte AS
(
SELECT *, ROW_NUMBER() OVER(PARTITION BY [User] ORDER BY Revenue DESC) AS rn
FROM t
), cte2 AS
(
SELECT c.Customer, c.[User], c.[Revenue]
,percentile = 1.0 * Revenue / NULLIF(c3.s,0)
,running_percentile = 1.0 * c2.s / NULLIF(c3.s,0)
FROM cte c
CROSS APPLY
(SELECT SUM(Revenue) AS s
FROM cte c2
WHERE c.[User] = c2.[User]
AND c2.rn <= c.rn) c2
CROSS APPLY
(SELECT SUM(Revenue) AS s
FROM cte c2
WHERE c.[User] = c2.[User]) AS c3
)
SELECT a.*
FROM cte2 a
CROSS APPLY (SELECT MIN(running_percentile) AS rp
FROM cte2
WHERE running_percentile >= 0.8
AND cte2.[User] = a.[User]) AS s
WHERE a.running_percentile <= s.rp;
Run Code Online (Sandbox Code Playgroud)
输出:
????????????????????????????????????????????????????????????????????
? Customer ? User ? Revenue ? percentile ? running_percentile ?
????????????????????????????????????????????????????????????????????
? 2 ? James ? 750 ? 0,384615384615 ? 0,384615384615 ?
? 1 ? James ? 500 ? 0,256410256410 ? 0,641025641025 ?
? 3 ? James ? 450 ? 0,230769230769 ? 0,871794871794 ?
? 7 ? Sarah ? 600 ? 0,444444444444 ? 0,444444444444 ?
? 5 ? Sarah ? 500 ? 0,370370370370 ? 0,814814814814 ?
????????????????????????????????????????????????????????????????????
Run Code Online (Sandbox Code Playgroud)
看起来很完美,翻译成我的大桌子并返回我需要的东西,花了5分钟完成它仍然无法完成你所做的!
SQL Server 2008不支持OVER()条款中的所有内容,但ROW_NUMBER确实如此.
首先cte只计算一个组内的位置:
??????????????????????????????????????
? Customer ? User ? Revenue ? rn ?
??????????????????????????????????????
? 2 ? James ? 750 ? 1 ?
? 1 ? James ? 500 ? 2 ?
? 3 ? James ? 450 ? 3 ?
? 8 ? James ? 150 ? 4 ?
? 9 ? James ? 100 ? 5 ?
? 7 ? Sarah ? 600 ? 1 ?
? 5 ? Sarah ? 500 ? 2 ?
? 6 ? Sarah ? 150 ? 3 ?
? 4 ? Sarah ? 100 ? 4 ?
??????????????????????????????????????
Run Code Online (Sandbox Code Playgroud)
第二个cte:
c2 子查询根据排名计算运行总计 ROW_NUMBERc3 计算每个用户的全额在最终查询s子查询中,查找running超过80%的最低总数.
编辑3:
使用ROW_NUMBER实际上是多余的.
WITH cte AS
(
SELECT c.Customer, c.[User], c.[Revenue]
,percentile = 1.0 * Revenue / NULLIF(c3.s,0)
,running_percentile = 1.0 * c2.s / NULLIF(c3.s,0)
FROM t c
CROSS APPLY
(SELECT SUM(Revenue) AS s
FROM t c2
WHERE c.[User] = c2.[User]
AND c2.Revenue >= c.Revenue) c2
CROSS APPLY
(SELECT SUM(Revenue) AS s
FROM t c2
WHERE c.[User] = c2.[User]) AS c3
)
SELECT a.*
FROM cte a
CROSS APPLY (SELECT MIN(running_percentile) AS rp
FROM cte c2
WHERE running_percentile >= 0.8
AND c2.[User] = a.[User]) AS s
WHERE a.running_percentile <= s.rp
ORDER BY [User], Revenue DESC;
Run Code Online (Sandbox Code Playgroud)