Luk*_*zda 17 sql t-sql sql-server postgresql sql-standards
背景:
在最初的情况下是非常简单的.计算从最高收入到最低收入的每位用户的总运行总数:
CREATE TABLE t(Customer INTEGER NOT NULL PRIMARY KEY
,"User" VARCHAR(5) NOT NULL
,Revenue INTEGER NOT NULL);
INSERT INTO t(Customer,"User",Revenue) VALUES
(001,'James',500),(002,'James',750),(003,'James',450),
(004,'Sarah',100),(005,'Sarah',500),(006,'Sarah',150),
(007,'Sarah',600),(008,'James',150),(009,'James',100);
Run Code Online (Sandbox Code Playgroud)
查询:
SELECT *,
1.0 * Revenue/SUM(Revenue) OVER(PARTITION BY "User") AS percentage,
1.0 * SUM(Revenue) OVER(PARTITION BY "User" ORDER BY Revenue DESC)
/SUM(Revenue) OVER(PARTITION BY "User") AS running_percentage
FROM t;
Run Code Online (Sandbox Code Playgroud)
输出:
??????????????????????????????????????????????????????????
? ID ? User ? Revenue ? percentage ? running_percentage ?
??????????????????????????????????????????????????????????
? 2 ? James ? 750 ? 0.38 ? 0.38 ?
? 1 ? James ? 500 ? 0.26 ? 0.64 ?
? 3 ? James ? 450 ? 0.23 ? 0.87 ?
? 8 ? James ? 150 ? 0.08 ? 0.95 ?
? 9 ? James ? 100 ? 0.05 ? 1 ?
? 7 ? Sarah ? 600 ? 0.44 ? 0.44 ?
? 5 ? Sarah ? 500 ? 0.37 ? 0.81 ?
? 6 ? Sarah ? 150 ? 0.11 ? 0.93 ?
? 4 ? Sarah ? 100 ? 0.07 ? 1 ?
??????????????????????????????????????????????????????????
Run Code Online (Sandbox Code Playgroud)
可以使用特定的窗口函数以不同方式计算它.
现在让我们假设我们不能使用窗口SUM并重写它:
SELECT c.Customer, c."User", c."Revenue"
,1.0 * Revenue / NULLIF(c3.s,0) AS percentage
,1.0 * c2.s / NULLIF(c3.s,0) AS running_percentage
FROM t c
CROSS APPLY
(SELECT SUM(Revenue) AS s
FROM t c2
WHERE c."User" = c2."User"
AND c2.Revenue >= c.Revenue) AS c2
CROSS APPLY
(SELECT SUM(Revenue) AS s
FROM t c2
WHERE c."User" = c2."User") AS c3
ORDER BY "User", Revenue DESC;
Run Code Online (Sandbox Code Playgroud)
我使用过,CROSS APPLY因为我不喜欢SELECTcolums列表中的相关子查询,并且c3使用了两次.
一切都按预期工作.但是当我们仔细观察c2并且c3非常相似时.那么为什么不组合它们并使用简单的条件聚合:
SELECT c.Customer, c."User", c."Revenue"
,1.0 * Revenue / NULLIF(c2.sum_total,0) AS percentage
,1.0 * c2.sum_running / NULLIF(c2.sum_total,0) AS running_percentage
FROM t c
CROSS APPLY
(SELECT SUM(Revenue) AS sum_total,
SUM(CASE WHEN c2.Revenue >= c.Revenue THEN Revenue ELSE 0 END)
AS sum_running
FROM t c2
WHERE c."User" = c2."User") AS c2
ORDER BY "User", Revenue DESC;
Run Code Online (Sandbox Code Playgroud)
不幸的是,这是不可能的.
在包含外部引用的聚合表达式中指定了多个列.如果要聚合的表达式包含外部引用,则该外部引用必须是表达式中引用的唯一列.
当然,我可以绕过另一个子查询包围它,但它变得有点"难看":
SELECT c.Customer, c."User", c."Revenue"
,1.0 * Revenue / NULLIF(c2.sum_total,0) AS percentage
,1.0 * c2.sum_running / NULLIF(c2.sum_total,0) AS running_percentage
FROM t c
CROSS APPLY
( SELECT SUM(Revenue) AS sum_total,
SUM(running_revenue) AS sum_running
FROM (SELECT Revenue,
CASE WHEN c2.Revenue >= c.Revenue THEN Revenue ELSE 0 END
AS running_revenue
FROM t c2
WHERE c."User" = c2."User") AS sub
) AS c2
ORDER BY "User", Revenue DESC
Run Code Online (Sandbox Code Playgroud)
Postgresql版.唯一的区别是LATERAL代替CROSS APPLY.
SELECT c.Customer, c."User", c.Revenue
,1.0 * Revenue / NULLIF(c2.sum_total,0) AS percentage
,1.0 * c2.running_sum / NULLIF(c2.sum_total,0) AS running_percentage
FROM t c
,LATERAL (SELECT SUM(Revenue) AS sum_total,
SUM(CASE WHEN c2.Revenue >= c.Revenue THEN c2.Revenue ELSE 0 END)
AS running_sum
FROM t c2
WHERE c."User" = c2."User") c2
ORDER BY "User", Revenue DESC;
Run Code Online (Sandbox Code Playgroud)
它非常好用.
SQLite/ MySQLversion(这就是我喜欢的原因LATERAL/CROSS APPLY):
SELECT c.Customer, c."User", c.Revenue,
1.0 * Revenue / (SELECT SUM(Revenue)
FROM t c2
WHERE c."User" = c2."User") AS percentage,
1.0 * (SELECT SUM(CASE WHEN c2.Revenue >= c.Revenue THEN c2.Revenue ELSE 0 END)
FROM t c2
WHERE c."User" = c2."User") /
(SELECT SUM(c2.Revenue)
FROM t c2
WHERE c."User" = c2."User") AS running_percentage
FROM t c
ORDER BY "User", Revenue DESC;
Run Code Online (Sandbox Code Playgroud)
SQLFiddleDemo-SQLite SQLFiddleDemo-MySQL
我已阅读带有外部参考的聚合:
限制的来源在
SQL-92标准中,并SQL Server从Sybase代码库继承.问题是SQL Server需要确定哪个查询将计算聚合.
我不寻找只展示如何规避它的答案.
问题是:
SQL Standard和SQL Server行为应该或SQL Server不完全实现(正确吗?)?我将非常感谢参考:
ISO standard (92或更新)SQL Server/Postgresql/Oracle/...).编辑:
我知道SQL-92没有概念LATERAL.但是带有子查询的版本(如in SQLite/MySQL)也不起作用.
编辑2:
为了简化它,我们只检查相关的子查询:
SELECT c.Customer, c."User", c.Revenue,
1.0*(SELECT SUM(CASE WHEN c2.Revenue >= c.Revenue THEN c2.Revenue ELSE 0 END)
FROM t c2
WHERE c."User" = c2."User")
/ (SELECT SUM(c2.Revenue)
FROM t c2
WHERE c."User" = c2."User") AS running_percentage
FROM t c
ORDER BY "User", Revenue DESC;
Run Code Online (Sandbox Code Playgroud)
上面的版本工作正常MySQL/SQLite/Postgresql.
在SQL Server我们得到错误.在用子查询进行包装之后将其"压平"到一个级别,它可以工作:
SELECT c.Customer, c."User", c.Revenue,
1.0 * (
SELECT SUM(CASE WHEN r1 >= r2 THEN r1 ELSE 0 END)
FROM (SELECT c2.Revenue AS r1, c.Revenue r2
FROM t c2
WHERE c."User" = c2."User") AS S) /
(SELECT SUM(c2.Revenue)
FROM t c2
WHERE c."User" = c2."User") AS running_percentage
FROM t c
ORDER BY "User", Revenue DESC;
Run Code Online (Sandbox Code Playgroud)
这个问题的关键在于如何SQL standard规范它.
有一个更简单的解决方案:
SELECT c.Customer, c."User", c."Revenue",
1.0 * Revenue/ NULLIF(c2.sum_total, 0) AS percentage,
1.0 * c2.sum_running / NULLIF(c2.sum_total, 0) AS running_percentage
FROM t c CROSS APPLY
(SELECT SUM(c2.Revenue) AS sum_total,
SUM(CASE WHEN c2.Revenue >= x.Revenue THEN c2.Revenue ELSE 0 END)
as sum_running
FROM t c2 CROSS JOIN
(SELECT c.REVENUE) x
WHERE c."User" = c2."User"
) c2
ORDER BY "User", Revenue DESC;
Run Code Online (Sandbox Code Playgroud)
我不确定为什么或者 SQL '92 标准中是否存在此限制。大约二十年前,我确实把它背得很好,但我不记得有什么特殊的限制。
我应该注意的是:
| 归档时间: |
|
| 查看次数: |
415 次 |
| 最近记录: |