首先过滤还是先加入?

Feb*_*ind 7 sql t-sql sql-server

我最初只是写了一个查询来查找每个客户的年度总订单数大于1.在1.query中,我过滤了结果集并将其与另一个找到客户名称的结果集连接起来.奇怪的是,我认为过滤器首先会产生更好的性能,因为加入需要的结果更少.所以我写了第二个查询,先加入,然后过滤,看起来比第一个查询更整洁.由于结果中的所有时间都较低,因此结果与我预期的相同.但我不确定哪个时间最重要?或者这个案子只是巧合?如何考虑表现?

use [AdventureWorks2012]
set statistics time on;

--1.filter first,join second
select tempC.*,tempP.FirstName,tempP.LastName
from 
(select  Year(OrderDate) As OrderYear,CustomerID,count(CustomerID) As CustomerOrderAmt
from Sales.SalesOrderHeader 
group by Year(OrderDate),CustomerID 
having count(CustomerID) >1
) as tempC
join(
select p.FirstName,p.LastName,c.CustomerID
from Person.Person as p join Sales.Customer as c on c.PersonID=p.BusinessEntityID
) as tempP
on tempC.CustomerID=tempP.CustomerID
order by tempC.OrderYear,tempC.CustomerID
GO

--2.join first,filter second

select Year(so.OrderDate) As Orderdate,so.CustomerID,count(so.CustomerID) As CustomerOrderAmt,p.FirstName,p.LastName
from Sales.SalesOrderHeader as so
join Sales.Customer as C on so.CustomerID=c.CustomerID
join Person.Person as p on c.PersonID=p.BusinessEntityID
group by Year(so.OrderDate),so.CustomerID,p.FirstName,p.LastName
having count(so.CustomerID)>1
go
Run Code Online (Sandbox Code Playgroud)

Nat*_*kel 6

查询优化器可以选择以产生相同逻辑结果的任何顺序执行操作,因此即使您尝试先过滤然后再加入,除非您通过使用表变量或临时表强制它,优化程序可以加入然后过滤.

如果你真的相信优化器正在做一些愚蠢的事情,你可以尝试像表var或临时表这样的东西,但是看起来很愚蠢可能实际上并不是这样,因为它们非常先进.

也就是说,有时您编写查询的方式影响优化程序的工作方式,因此您通常应该查看执行计划.如果它们相同,请使用最清晰的代码.如果他们不再进行测试和测试,那就选择最好的.


Dev*_*art 5

我认为使用子查询是一种很好的做法,可以减少连接操作的总数以及块 GROUP BY 中的列数量。所以我马上告诉你,第一个查询肯定效率更高。

查询:

SELECT  
      t.OrderYear
    , t.CustomerID
    , t.CustomerOrderAmt 
    , p.FirstName 
    , p.LastName
FROM ( 
     SELECT 
            OrderYear = YEAR(OrderDate)  
          , CustomerID 
          , CustomerOrderAmt = COUNT(CustomerID) 
     FROM Sales.SalesOrderHeader
     GROUP BY 
            YEAR(OrderDate) 
          , CustomerID
     HAVING COUNT(CustomerID) > 1
) t
JOIN ( 
     SELECT   
            p.FirstName 
          , p.LastName 
          , c.CustomerID
     FROM Person.Person p
     JOIN Sales.Customer c ON c.PersonID = p.BusinessEntityID
) p ON t.CustomerID = p.CustomerID
ORDER BY 
       t.OrderYear 
     , t.CustomerID
Run Code Online (Sandbox Code Playgroud)

SELECT  
       Orderdate = YEAR(so.OrderDate)  
     , so.CustomerID 
     , CustomerOrderAmt = COUNT(so.CustomerID)  
     , FirstName = MAX(p.FirstName)
     , LastName = MAX(p.LastName)
FROM Sales.SalesOrderHeader so
JOIN Sales.Customer c ON so.CustomerID = c.CustomerID
JOIN Person.Person p ON c.PersonID = p.BusinessEntityID
GROUP BY 
       YEAR(so.OrderDate) 
     , so.CustomerID 
HAVING COUNT(so.CustomerID) > 1
Run Code Online (Sandbox Code Playgroud)

查询成本:

查询成本

执行时间处理时间:

-- first query
SQL Server Execution Times:
   CPU time = 94 ms,  elapsed time = 395 ms.

-- second query   
SQL Server Execution Times:
   CPU time = 140 ms,  elapsed time = 480 ms.
Run Code Online (Sandbox Code Playgroud)