SQL Server - 具有相关性的条件聚合

Luk*_*zda 17 sql t-sql sql-server postgresql sql-standards

背景:

最初的情况下是非常简单的.计算从最高收入到最低收入的每位用户的总运行总数:

CREATE TABLE t(Customer INTEGER  NOT NULL PRIMARY KEY 
              ,"User"   VARCHAR(5) NOT NULL
              ,Revenue  INTEGER  NOT NULL);

INSERT INTO t(Customer,"User",Revenue) VALUES
(001,'James',500),(002,'James',750),(003,'James',450),
(004,'Sarah',100),(005,'Sarah',500),(006,'Sarah',150),
(007,'Sarah',600),(008,'James',150),(009,'James',100);
Run Code Online (Sandbox Code Playgroud)

查询:

SELECT *,
    1.0 * Revenue/SUM(Revenue) OVER(PARTITION BY "User") AS percentage,
    1.0 * SUM(Revenue) OVER(PARTITION BY "User" ORDER BY Revenue DESC)
         /SUM(Revenue) OVER(PARTITION BY "User") AS running_percentage
FROM t;
Run Code Online (Sandbox Code Playgroud)

LiveDemo

输出:

??????????????????????????????????????????????????????????
? ID ? User  ? Revenue ? percentage ? running_percentage ?
??????????????????????????????????????????????????????????
?  2 ? James ?     750 ? 0.38       ? 0.38               ?
?  1 ? James ?     500 ? 0.26       ? 0.64               ?
?  3 ? James ?     450 ? 0.23       ? 0.87               ?
?  8 ? James ?     150 ? 0.08       ? 0.95               ?
?  9 ? James ?     100 ? 0.05       ? 1                  ?
?  7 ? Sarah ?     600 ? 0.44       ? 0.44               ?
?  5 ? Sarah ?     500 ? 0.37       ? 0.81               ?
?  6 ? Sarah ?     150 ? 0.11       ? 0.93               ?
?  4 ? Sarah ?     100 ? 0.07       ? 1                  ?
??????????????????????????????????????????????????????????
Run Code Online (Sandbox Code Playgroud)

可以使用特定的窗口函数以不同方式计算它.


现在让我们假设我们不能使用窗口SUM并重写它:

SELECT c.Customer, c."User", c."Revenue"
    ,1.0 * Revenue / NULLIF(c3.s,0) AS percentage
    ,1.0 * c2.s    / NULLIF(c3.s,0) AS running_percentage
FROM t c
CROSS APPLY
        (SELECT SUM(Revenue) AS s
        FROM t c2
        WHERE c."User" = c2."User"
            AND c2.Revenue >= c.Revenue) AS c2
CROSS APPLY
        (SELECT SUM(Revenue) AS s
        FROM t c2
        WHERE c."User" = c2."User") AS c3
ORDER BY "User", Revenue DESC;
Run Code Online (Sandbox Code Playgroud)

LiveDemo

我使用过,CROSS APPLY因为我不喜欢SELECTcolums列表中的相关子查询,并且c3使用了两次.

一切都按预期工作.但是当我们仔细观察c2并且c3非常相似时.那么为什么不组合它们并使用简单的条件聚合:

SELECT c.Customer, c."User", c."Revenue"
    ,1.0 * Revenue        / NULLIF(c2.sum_total,0) AS percentage
    ,1.0 * c2.sum_running / NULLIF(c2.sum_total,0) AS running_percentage
FROM t c
CROSS APPLY
        (SELECT SUM(Revenue) AS sum_total,
                SUM(CASE WHEN c2.Revenue >= c.Revenue THEN Revenue ELSE 0 END) 
                AS sum_running
        FROM t c2
        WHERE c."User" = c2."User") AS c2
ORDER BY "User", Revenue DESC;
Run Code Online (Sandbox Code Playgroud)

不幸的是,这是不可能的.

在包含外部引用的聚合表达式中指定了多个列.如果要聚合的表达式包含外部引用,则该外部引用必须是表达式中引用的唯一列.

当然,我可以绕过另一个子查询包围它,但它变得有点"难看":

SELECT c.Customer, c."User", c."Revenue"
    ,1.0 * Revenue        / NULLIF(c2.sum_total,0) AS percentage
    ,1.0 * c2.sum_running / NULLIF(c2.sum_total,0) AS running_percentage
FROM t c
CROSS APPLY
(   SELECT SUM(Revenue) AS sum_total,
           SUM(running_revenue) AS sum_running
     FROM (SELECT Revenue,
                  CASE WHEN c2.Revenue >= c.Revenue THEN Revenue ELSE 0 END 
                  AS running_revenue
           FROM t c2
           WHERE c."User" = c2."User") AS sub
) AS c2
ORDER BY "User", Revenue DESC
Run Code Online (Sandbox Code Playgroud)

LiveDemo


Postgresql版.唯一的区别是LATERAL代替CROSS APPLY.

SELECT c.Customer, c."User", c.Revenue
    ,1.0 * Revenue        / NULLIF(c2.sum_total,0) AS percentage 
    ,1.0 * c2.running_sum / NULLIF(c2.sum_total,0) AS running_percentage 
FROM t c
,LATERAL (SELECT SUM(Revenue) AS sum_total,
                 SUM(CASE WHEN c2.Revenue >= c.Revenue THEN c2.Revenue ELSE 0 END) 
                 AS running_sum
        FROM t c2
        WHERE c."User" = c2."User") c2
ORDER BY "User", Revenue DESC;
Run Code Online (Sandbox Code Playgroud)

SqlFiddleDemo

它非常好用.


SQLite/ MySQLversion(这就是我喜欢的原因LATERAL/CROSS APPLY):

SELECT c.Customer, c."User", c.Revenue,
    1.0 * Revenue / (SELECT SUM(Revenue)
                     FROM t c2
                     WHERE c."User" = c2."User") AS percentage,
    1.0 * (SELECT SUM(CASE WHEN c2.Revenue >= c.Revenue THEN c2.Revenue ELSE 0 END)
           FROM t c2
          WHERE c."User" = c2."User")  / 
          (SELECT SUM(c2.Revenue)
           FROM t c2
           WHERE c."User" = c2."User") AS running_percentage
FROM t c
ORDER BY "User", Revenue DESC;
Run Code Online (Sandbox Code Playgroud)

SQLFiddleDemo-SQLite SQLFiddleDemo-MySQL


我已阅读带有外部参考的聚合:

限制的来源在SQL-92标准中,并SQL ServerSybase代码库继承.问题是SQL Server需要确定哪个查询将计算聚合.

我不寻找展示如何规避它的答案.

问题是:

  1. 哪部分标准不允许或干扰它?
  2. 为什么其他RDBMS对这种外部依赖没有问题?
  3. 它们是否应该扩展SQL StandardSQL Server行为应该或SQL Server不完全实现(正确吗?)?

我将非常感谢参考:

  • ISO standard (92或更新)
  • SQL Server标准支持
  • 来自任何解释它的RDBMS的官方文档(SQL Server/Postgresql/Oracle/...).

编辑:

我知道SQL-92没有概念LATERAL.但是带有子查询的版本(如in SQLite/MySQL)也不起作用.

LiveDemo

编辑2:

为了简化它,我们只检查相关的子查询:

SELECT c.Customer, c."User", c.Revenue,
       1.0*(SELECT SUM(CASE WHEN c2.Revenue >= c.Revenue THEN c2.Revenue ELSE 0 END)
              FROM t c2
              WHERE c."User" = c2."User") 
       / (SELECT SUM(c2.Revenue)
          FROM t c2
          WHERE c."User" = c2."User") AS running_percentage
FROM t c
ORDER BY "User", Revenue DESC;
Run Code Online (Sandbox Code Playgroud)

上面的版本工作正常MySQL/SQLite/Postgresql.

SQL Server我们得到错误.在用子查询进行包装之后将其"压平"到一个级别,它可以工作:

SELECT c.Customer, c."User", c.Revenue,
      1.0 * (
              SELECT SUM(CASE WHEN r1 >= r2 THEN r1 ELSE 0 END)
              FROM (SELECT c2.Revenue AS r1, c.Revenue r2
                    FROM t c2
                    WHERE c."User" = c2."User") AS S)  / 
             (SELECT SUM(c2.Revenue)
              FROM t c2
              WHERE c."User" = c2."User") AS running_percentage
FROM t c
ORDER BY "User", Revenue DESC;
Run Code Online (Sandbox Code Playgroud)

这个问题的关键在于如何SQL standard规范它.

LiveDemo

Gor*_*off 4

有一个更简单的解决方案:

SELECT c.Customer, c."User", c."Revenue",
       1.0 * Revenue/ NULLIF(c2.sum_total, 0) AS percentage,
       1.0 * c2.sum_running / NULLIF(c2.sum_total, 0) AS running_percentage
FROM t c CROSS APPLY
     (SELECT SUM(c2.Revenue) AS sum_total,
             SUM(CASE WHEN c2.Revenue >= x.Revenue THEN c2.Revenue ELSE 0 END) 
                 as sum_running
      FROM t c2 CROSS JOIN
           (SELECT c.REVENUE) x
      WHERE c."User" = c2."User"
     ) c2
ORDER BY "User", Revenue DESC;
Run Code Online (Sandbox Code Playgroud)

我不确定为什么或者 SQL '92 标准中是否存在此限制。大约二十年前,我确实把它背得很好,但我不记得有什么特殊的限制。

我应该注意的是:

  • 在 SQL 92 标准出现时,横向连接还没有真正受到关注。Sybase 绝对没有这样的概念。
  • 其他数据库确实存在外部引用的问题。特别是,它们经常将范围限制在一层深度。
  • SQL 标准本身倾向于高度政治化(即供应商驱动),而不是由实际的数据库用户需求驱动。嗯,随着时间的推移,它确实朝着正确的方向发展。