连接语法/样式性能考虑

Jef*_*nCO 5 sql-server sql-server-2012

我们最近在我们的一个存储过程中发现,通过从这里更改查询的连接语法/样式,我们获得了显着的性能改进......

SELECT b.bla, c.foo, d.bar
FROM dbo.TableB b
JOIN dbo.TableC c
    JOIN dbo.TableD d  -- <-- Nested join syntax
    ON d.yyy = c.yyy
ON c.xxx = b.xxx
Run Code Online (Sandbox Code Playgroud)

对此...

SELECT b.bla, c.foo, d.bar
FROM dbo.TableB b
JOIN dbo.TableC c
ON c.xxx = b.xxx
JOIN dbo.TableD d   -- <-- Regular way
ON d.yyy = c.yyy
Run Code Online (Sandbox Code Playgroud)

注意:在实际查询中,有 10 个连接表,包括内连接和外连接。就sql数据而言,这些表并不大。没有聚合。输出中有一个 DISTINCT。所有连接都指向一个主键,但外键不一定被索引。

我们肯定会改变我们的方式,但我仍然很好奇关于这种风格的正确“指导”。我经常使用“缩进”样式来表示诸如查找表之类的“更具可读性”的连接。

Pau*_*ite 7

In a world where the query optimizer considered all possible join orders, and contained all possible logical transformations, the syntax we use for our queries would not matter at all.

As it is, the optimizer generally uses heuristics to pick an initial join order and explores a number of join order rewrites from there. It does this to avoid excessive compilation time and resource usage. It doesn't take all that many joins for the number of possible combinations to become unreasonable to explore exhaustively.

To take an extreme example, 42 joins are enough to generate more alternatives than there are atoms in the observable universe. More realistically, even 7 tables are enough to produce 665,280 alternatives. Although this is not a mind-boggling number, it would still take very significant time (and memory) to explore those alternatives completely.

Although the heuristics are largely based on the type of join (inner, outer, cross...) and cardinality estimates, the textual order of the query can also have an impact. Sometimes, this is an optimizer limitation - NOT EXISTS clauses are not reordered, and outer join reordering is very limited. Even with simple inner joins, the interaction between textual order, initial join order heuristics, and optimizer internals can be difficult to predict with certainty.

To take an example using the AdventureWorks sample database, I can write a query using the a common syntax form as:

SELECT
    P.Name,
    PS.Name,
    SUM(TH.Quantity),
    SUM(INV.Quantity)
FROM Production.Product AS P
JOIN Production.ProductSubcategory AS PS
    ON PS.ProductSubcategoryID = P.ProductSubcategoryID
JOIN Production.TransactionHistory AS TH
    ON TH.ProductID = P.ProductID
JOIN Production.ProductInventory AS INV
    ON INV.ProductID = P.ProductID
GROUP BY
    P.ProductID,
    P.Name,
    PS.ProductSubcategoryID,
    PS.Name;
Run Code Online (Sandbox Code Playgroud)

Before cost-based optimization, the logical query tree looks like this (note the join order is not the same as the written order):

逻辑树

I can (carefully) rewrite the query to use 'nested' syntax:

SELECT
    P.Name,
    PS.Name,
    SUM(TH.Quantity),
    SUM(INV.Quantity)
FROM Production.ProductSubcategory AS PS
JOIN Production.Product AS P
JOIN Production.TransactionHistory AS TH
JOIN Production.ProductInventory AS INV
    ON INV.ProductID = TH.ProductID
    ON TH.ProductID = P.ProductID
    ON P.ProductSubcategoryID = PS.ProductSubcategoryID
GROUP BY
    P.ProductID,
    P.Name,
    PS.ProductSubcategoryID,
    PS.Name;
Run Code Online (Sandbox Code Playgroud)

In which case the logical tree at the same point is:

输入树 2

The two different syntaxes produce a different initial join order in this case. After cost-based optimization, both produce the same output plan shape:

计划形状

There are detailed differences between the two plans, with the 'nested' syntax producing a plan with a somewhat lower estimated cost:

方案二

The two inputs took a slightly different path through the optimizer, so it isn't all that surprising there are slight differences.

In general, using different syntax will sometimes (definitely not always!) produce different plan results. There is no broad correlation between one syntax and better plans. Most people write and maintain queries using something like the non-nested join syntax, so it often makes practical sense to use that.

To summarize, my advice is to write queries using whichever syntax seems most natural (and maintainable!) to you and your peers. If you get a better plan for a specific query using a particular syntax, by all means use it - but be sure to test that you still get the better plan whenever you patch or upgrade SQL Server :)