简单查询的选择性估计误差

Rad*_*ača 5 sql sql-server estimation sql-server-2016

让我们tt像这样创建一个简单的表

WITH x AS (SELECT n FROM (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) v(n)), t1 AS
(
  SELECT ones.n + 10 * tens.n + 100 * hundreds.n + 1000 * thousands.n + 10000 * tenthousands.n as id  
  FROM x ones,     x tens,      x hundreds,       x thousands,       x tenthousands,       x hundredthousands
)
SELECT  id,
        id % 100 groupby,
        row_number() over (partition by id % 100 order by id) orderby,
        row_number() over (partition by id % 100 order by id) / (id % 100 + 1) local_search
INTO tt
FROM t1
Run Code Online (Sandbox Code Playgroud)

我有一个简单的查询Q1:

select distinct g1.groupby,
        (select count(*) from tt g2 
         where local_search = 1 and g1.groupby = g2.groupby) as orderby
from tt g1
option(maxdop 1)
Run Code Online (Sandbox Code Playgroud)

我想知道为什么SQL Server估计Q1的结果大小如此糟糕(参见printscreen).查询计划中的大多数运算符都是精确估计的,但是,根哈希匹配运算符引入了完全疯狂的猜测.

在此输入图像描述

为了使它更有趣,我尝试了对Q1的不同重写.如果我应用子查询的去相关,我得到一个等价的查询Q2:

select main.groupby, 
       coalesce(sub1.orderby,0) orderby
from
(
    select distinct g1.groupby
    from tt g1
) main
left join
(
    select groupby, count(*) orderby
    from tt g2 
    where local_search = 1
    group by groupby
) sub1 on sub1.groupby = main.groupby
option(maxdop 1)
Run Code Online (Sandbox Code Playgroud)

该查询是在两个方面有趣:(1)估计是准确的(见PRINTSCREEN),(2)其具有也不同的查询计划,这是更有效的是Q1的查询计划.

在此输入图像描述

所以问题是:为什么Q1的估计是有效的,而Q2的估计是精确的?请不要发布这个SQL(我知道这是即使不子查询被写入)其他的重写,我感兴趣的只是对选择性估计行为的解释.谢谢.

Mar*_*ith 3

它不认识到orderby具有相同值的所有行的值都相同,groupby因此它认为distinct groupby, orderby将有更多的组合而不仅仅是 distinct groupby

DISTINCT orderby它将(对我来说这是)的估计值与(对我来说这是)35.0367的估计值相乘,就好像它们不相关一样。DISTINCT groupby100

3503.67我得到了Q1 中根节点的估计

此重写避免了它,因为它现在仅按单列分组groupby

SELECT groupby,
       max(orderby) AS orderby
FROM   (SELECT g1.groupby,
               (SELECT count(*)
                FROM   tt g2
                WHERE  local_search = 1
                       AND g1.groupby = g2.groupby) AS orderby
        FROM   tt g1) d
GROUP  BY groupby
OPTION(maxdop 1) 
Run Code Online (Sandbox Code Playgroud)

尽管如 Q2 和评论@GarethD所示,这并不是此查询的最佳方法,但多次运行相关子查询并丢弃重复项的效率低下。