GROUPING SETS 使用计算表达式返回意外结果

sep*_*pic 8 sql-server execution-plan group-by

这里我们有两个类似的查询,使用grouping sets whereSELECT子句包含一些在聚合中计算的表达式:

SELECT RN10, RN10 / 10, COUNT(*) FROM 
(
       SELECT RN, RN/10 AS RN10, RN/100 AS RN100 FROM 
       (
               SELECT RN = -1 + ROW_NUMBER() OVER (ORDER BY 1/0) 
               FROM master..spt_values
       ) A
) B
GROUP BY GROUPING SETS ((RN10), (RN10 / 10), ())
ORDER BY 1, 2
Run Code Online (Sandbox Code Playgroud)

它的计划在这里:第一个查询计划

SELECT RN10, SUBSTRING(RN,3,99), COUNT(*) FROM 
(
       SELECT RN, SUBSTRING(RN,2,99) AS RN10 FROM 
       (
               SELECT RN = CAST(-1 + ROW_NUMBER() OVER (ORDER BY 1/0) AS VARCHAR(99)) 
               FROM master..spt_values
       ) A
) B
GROUP BY GROUPING SETS ((RN10), (SUBSTRING(RN,3,99)), ())
ORDER BY 1, 2
Run Code Online (Sandbox Code Playgroud)

相应的计划在这里:第二个查询计划

两个查询首先计算一些用于聚合的表达式,RN10 / 10在第一种情况和SUBSTRING(RN,3,99)第二种情况下,然后在SELECT子句中使用相同的表达式,但第一个计划显示它在第一个查询中重新计算,而不是在第二个查询中。

结果我们NULL在第一个结果集中有s ,这是非常出乎意料的:

结果

有人可以解释为什么第一个查询进行两次计算(一次在聚合中,一次在最后一次select)而第二次只进行一次?

Mar*_*ith 12

我将使用一个更简单的示例,可以清楚地看到预期结果是什么。

CREATE TABLE Queen
(
   FirstName        VARCHAR(7),
   Surname          VARCHAR(7)
); 

INSERT INTO Queen
    (FirstName, Surname)
VALUES
    ('Brian',   'May'),
    ('Freddie', 'Mercury'),
    ('John',    'Deacon'),
    ('Roger',   'Taylor')
;
Run Code Online (Sandbox Code Playgroud)

查询 1

SELECT Surname,
       NULL AS SurnameInitial,
       COUNT(*) AS Count
FROM   Queen
GROUP  BY Surname
UNION ALL
SELECT NULL AS Surname,
       LEFT(Surname,1) AS SurnameInitial,
       COUNT(*) AS Count
FROM   Queen
GROUP  BY LEFT(Surname,1)
Run Code Online (Sandbox Code Playgroud)

查询 1 结果

+---------+----------------+-------+
| Surname | SurnameInitial | Count |
+---------+----------------+-------+
| Deacon  | NULL           |     1 |
| May     | NULL           |     1 |
| Mercury | NULL           |     1 |
| Taylor  | NULL           |     1 |
| NULL    | D              |     1 |
| NULL    | M              |     2 |
| NULL    | T              |     1 |
+---------+----------------+-------+
Run Code Online (Sandbox Code Playgroud)

查询 2

SELECT Surname,
       LEFT(Surname,1) AS SurnameInitial,
       COUNT(*) AS Count
FROM   Queen
GROUP  BY GROUPING SETS ( ( Surname ), (LEFT(Surname,1)) ) 
ORDER BY SurnameInitial, Surname
Run Code Online (Sandbox Code Playgroud)

查询 2 结果

尽管ORDER BY SurnameInitial和事实NULL排序第一的SQL Server与行SurnameInitial作为NULL被排在最后。

+---------+----------------+-------+
| Surname | SurnameInitial | Count |
+---------+----------------+-------+
| Deacon  | D              |     1 |
| May     | M              |     1 |
| Mercury | M              |     1 |
| Taylor  | T              |     1 |
| NULL    | NULL           |     1 |
| NULL    | NULL           |     2 |
| NULL    | NULL           |     1 |
+---------+----------------+-------+
Run Code Online (Sandbox Code Playgroud)

查询 1 和 2应该返回相同的结果。问题是 SQL Server 决定像下面的 SQL 一样对待它

WITH GrpSets AS
(
SELECT Surname,
       COUNT(*) AS Count
FROM   Queen
GROUP  BY Surname
UNION ALL
SELECT NULL AS Surname,
       COUNT(*) AS Count
FROM   Queen
GROUP  BY LEFT(Surname,1)
)
SELECT Surname,
       LEFT(Surname,1) AS SurnameInitial,
       Count
FROM GrpSets
Run Code Online (Sandbox Code Playgroud)

这对我来说只是一个错误(跟踪标志 8605 表明损坏已经在初始查询树表示中完成)。错误报告

查询 3

SELECT Surname,
       LEFT(FirstName,1) AS FirstNameInitial,
       COUNT(*) AS Count
FROM   Queen
GROUP  BY GROUPING SETS ( ( Surname ), (LEFT(FirstName,1)) ) 
Run Code Online (Sandbox Code Playgroud)

查询 3 结果

+---------+------------------+-------+
| Surname | FirstNameInitial | Count |
+---------+------------------+-------+
| NULL    | B                |     1 |
| NULL    | F                |     1 |
| NULL    | J                |     1 |
| NULL    | R                |     1 |
| Deacon  | NULL             |     1 |
| May     | NULL             |     1 |
| Mercury | NULL             |     1 |
| Taylor  | NULL             |     1 |
+---------+------------------+-------+
Run Code Online (Sandbox Code Playgroud)

Query3 不符合对列和引用该列的表达式进行分组的问题模式。无论如何,这里甚至不可能发生同样的问题,因为分组集部分等效于

SELECT Surname,
       NULL AS FirstNameInitial,
       COUNT(*) AS Count
FROM   Queen
GROUP  BY Surname
UNION ALL
SELECT NULL AS Surname,
       LEFT(FirstName,1) AS FirstNameInitial,
       COUNT(*) AS Count
FROM   Queen
GROUP  BY LEFT(FirstName,1)
Run Code Online (Sandbox Code Playgroud)

这并不传递出整个FirstName柱的上游(或者甚至有一个保证唯一的 FirstName该列被传递出),所以它是不可能的是,LEFT(FirstName,1)最重要的是要计算表达式。

出于同样的原因,您看不到(RN10), (SUBSTRING(RN,3,99)).

@i-one评论中的原因很可能

规范化(代数化)中的错误。它具有SELECTGROUP BY. 同样的逻辑似乎允许我们编写例如

SELECT Surname, LEFT(Surname, 1), COUNT(*)
FROM   Queen
GROUP BY Surname
Run Code Online (Sandbox Code Playgroud)

无需显式添加计算表达式,如下所示

GROUP BY Surname, LEFT(Surname, 1)
Run Code Online (Sandbox Code Playgroud)

或者另一个例子是

SELECT Surname,
       LEFT(Surname,1) AS SurnameInitial,
       LEFT(Surname,2) AS SurnamePrefix,
       COUNT(*) AS Count
FROM   Queen
GROUP  BY GROUPING SETS ( ( Surname ), (LEFT(Surname,1)) ) 
Run Code Online (Sandbox Code Playgroud)

在这种情况下,LEFT(Surname,2)允许并且计算它的唯一方法是以对LEFT(Surname,1)案例有问题的方式进行计算。