为什么 SQL Server 不优化 UNION?

孔夫子*_*孔夫子 7 performance sql-server optimization union query-performance

考虑这些查询(SQL Fiddle):

查询 1:

SELECT * INTO #TMP1 FROM Foo
UNION
SELECT * FROM Boo
UNION
SELECT * FROM Koo;
Run Code Online (Sandbox Code Playgroud)

查询 2:

SELECT * INTO #TMP2 FROM Foo
UNION
SELECT * FROM Boo
UNION ALL
SELECT * FROM Koo;
Run Code Online (Sandbox Code Playgroud)

请注意,Koo 与 Boo/Foo 不重叠,因此最终结果是相同的。问题是为什么第一个UNION / UNION组合没有合并成单个 SORT 操作?

Pau*_*ite 18

查询优化器确实有 n 元运算符,但执行引擎的数量要少得多。为了说明这一点,我将使用您的表的简化版本 - (SQL Fiddle)

SELECT DISTINCT
    number
INTO foo
FROM master..spt_values
WHERE 
    number < 1000;

SELECT DISTINCT
    number
INTO boo
FROM master..spt_values
WHERE 
    number between 300 and 1005;

SELECT DISTINCT
    number
INTO koo
FROM master..spt_values
WHERE 
    number > 1006;

ALTER TABLE dbo.foo ADD PRIMARY KEY (number);
ALTER TABLE dbo.boo ADD PRIMARY KEY (number);
ALTER TABLE dbo.koo ADD PRIMARY KEY (number);
Run Code Online (Sandbox Code Playgroud)

给定这些表和数据,让我们看一下三向UNION查询的输入树:

SELECT f.number FROM dbo.foo AS f
UNION
SELECT b.number FROM dbo.boo AS b
UNION
SELECT k.number FROM dbo.koo AS k
OPTION (QUERYTRACEON 3604, QUERYTRACEON 8605);

LogOp_Union
    OUTPUT(COL: Union1006 )
    CHILD(QCOL: [f].number)
    CHILD(QCOL: [b].number)
    CHILD(QCOL: [k].number)
    LogOp_Project
        LogOp_Get TBL: dbo.foo(alias TBL: f)
        AncOp_PrjList 
    LogOp_Project
        LogOp_Get TBL: dbo.boo(alias TBL: b)
        AncOp_PrjList 
    LogOp_Project
        LogOp_Get TBL: dbo.koo(alias TBL: k)
        AncOp_PrjList 
Run Code Online (Sandbox Code Playgroud)

逻辑联合运算符有一个输出和三个子输入。在基于成本的优化之后,选择的物理树是具有三个输入的合并联合:

SELECT f.number FROM dbo.foo AS f
UNION
SELECT b.number FROM dbo.boo AS b
UNION
SELECT k.number FROM dbo.koo AS k
OPTION (QUERYTRACEON 3604, QUERYTRACEON 8607);

PhyOp_MergeUnion
    PhyOp_Range TBL: dbo.foo(alias TBL: f)(1) ASC
    PhyOp_Range TBL: dbo.boo(alias TBL: b)(1) ASC
    PhyOp_Range TBL: dbo.koo(alias TBL: k)(1) ASC
Run Code Online (Sandbox Code Playgroud)

优化器的输出被重新加工成执行引擎(没有 n-ary 合并联合)可以处理的形式:

合并工会计划

优化后重写将 n 元展开PhyOp_MergeUnion为多个合并联合运算符。请注意所有估计成本如何与“原始”联合运营商相关联 - 其他运营商的成本估计为零。

优化器关于使用 n 元运算符的联合的原因提供了一个解释,说明为什么它不考虑将您的第一个示例重写为与第二个示例相同的计划(三向联合是单个树节点)。

第二个原因是没有强制执行“缺乏重叠”的约束。在约束就位之前,不能保证boo和之间的联合koo不会产生重复,所以我们得到了一个重复删除计划(在这种情况下是合并联合):

SELECT b.number FROM dbo.boo AS b
UNION
SELECT k.number FROM dbo.koo AS k;
Run Code Online (Sandbox Code Playgroud)

boo/koo 没有限制

添加以下约束可确保在不使查询的缓存计划无效的情况下不会违反非重叠条件:

ALTER TABLE dbo.foo WITH CHECK ADD CHECK (number < 1000);
ALTER TABLE dbo.boo WITH CHECK ADD CHECK (number BETWEEN 300 AND 1005);
ALTER TABLE dbo.koo WITH CHECK ADD CHECK (number > 1006);
Run Code Online (Sandbox Code Playgroud)

现在优化器可以安全地简单连接:

boo/koo 有约束

然而,即使有这些约束,三向联合查询仍然显示为三个联合,因为优化器通常不考虑拆分 n 元运算符来探索替代方案。n 元运算符对于控制搜索空间非常有用;考虑到优化器的目标是快速找到一个好的计划,将它分开通常会适得其反。

SELECT f.number FROM dbo.foo AS f
UNION
SELECT b.number FROM dbo.boo AS b
UNION
SELECT k.number FROM dbo.koo AS k;
Run Code Online (Sandbox Code Playgroud)

将联合计划与约束合并

当写为UNIONand 时UNION ALL,不能再使用 n 元运算符(类型不匹配),因此树具有单独的节点:

SELECT f.number FROM dbo.foo AS f
UNION
SELECT b.number FROM dbo.boo AS b
UNION ALL
SELECT k.number FROM dbo.koo AS k
OPTION (QUERYTRACEON 3604, QUERYTRACEON 8605);

LogOp_UnionAll
    OUTPUT(COL: Union1007 )
    CHILD(COL: Union1004 )
    CHILD(QCOL: [k].number)

    LogOp_Union
        OUTPUT(COL: Union1004 )
        CHILD(QCOL: [f].number)
        CHILD(QCOL: [b].number)

        LogOp_Project
            LogOp_Get TBL: dbo.foo(alias TBL: f)
            AncOp_PrjList 

        LogOp_Project
            LogOp_Get TBL: dbo.boo(alias TBL: b)
            AncOp_PrjList 

    LogOp_Project
        LogOp_Get TBL: dbo.koo(alias TBL: k)
        AncOp_PrjList 
Run Code Online (Sandbox Code Playgroud)