将数据集与EXCEPT组合在一起检查LEFT JOIN中的IS NULL

Jul*_*ian 6 sql-server sqlperformance sql-server-2008 sql-execution-plan

我目前正在通过Microsoft SQL Server 2008 - 数据库开发(MCTS考试70-433)认证.在前面有关组合数据集的章节中,我遇到了EXCEPT(和INTERSECT)命令.一个示例显示如何使用EXCEPT从第一个表中没有相关值的表中获取所有值,如下所示:

SELECT EmployeeKey FROM DimEmployee
EXCEPT
SELECT EmployeeKey FROM FactResellerSales
Run Code Online (Sandbox Code Playgroud)

这个EXCEPT命令对我来说是新的,但是根据我今天所知道的,我仍然可以使用a轻松地解决问题,LEFT JOINIS NULL通过以下方式检查连接约束:

SELECT DISTINCT DimEmployee.EmployeeKey FROM DimEmployee
LEFT JOIN FactResellerSales ON FactResellerSales.EmployeeKey = DimEmployee.EmployeeKey
WHERE FactResellerSales.EmployeeKey IS NULL
Run Code Online (Sandbox Code Playgroud)

现在我开始怀疑其中哪一个具有最佳性能.我试图研究查询执行计划,但我不太擅长阅读这些,所以它并没有让我更聪明.对于使用的查询EXCEPT,计划如下所示:

|--Nested Loops(Left Anti Semi Join, OUTER REFERENCES:([AdventureWorksDW2008].[dbo].[DimEmployee].[EmployeeKey], [Expr1006]) WITH UNORDERED PREFETCH)
    |--Index Scan(OBJECT:([AdventureWorksDW2008].[dbo].[DimEmployee].[IX_DimEmployee_SalesTerritoryKey]))
    |--Top(TOP EXPRESSION:((1)))
        |--Index Seek(OBJECT:([AdventureWorksDW2008].[dbo].[FactResellerSales].[IX_FactResellerSales_EmployeeKey]), SEEK:([AdventureWorksDW2008].[dbo].[FactResellerSales].[EmployeeKey]=[AdventureWorksDW2008].[dbo].[DimEmployee].[EmployeeKey]) ORDERED FORWARD)
Run Code Online (Sandbox Code Playgroud)

对于使用LEFT JOIN它的人看起来像这样:

|--Stream Aggregate(GROUP BY:([AdventureWorksDW2008].[dbo].[DimEmployee].[EmployeeKey]))
    |--Filter(WHERE:([AdventureWorksDW2008].[dbo].[FactResellerSales].[EmployeeKey] IS NULL))
        |--Merge Join(Left Outer Join, MERGE:([AdventureWorksDW2008].[dbo].[DimEmployee].[EmployeeKey])=([AdventureWorksDW2008].[dbo].[FactResellerSales].[EmployeeKey]), RESIDUAL:([AdventureWorksDW2008].[dbo].[FactResellerSales].[EmployeeKey]=[AdventureWorksDW2008].[dbo].[DimEmployee].[EmployeeKey]))
            |--Clustered Index Scan(OBJECT:([AdventureWorksDW2008].[dbo].[DimEmployee].[PK_DimEmployee_EmployeeKey]), ORDERED FORWARD)
            |--Index Scan(OBJECT:([AdventureWorksDW2008].[dbo].[FactResellerSales].[IX_FactResellerSales_EmployeeKey]), ORDERED FORWARD)
Run Code Online (Sandbox Code Playgroud)

查询中使用的表来自AdventureWorksDW2008示例数据库,因此下面我还包括两个表的创建脚本,以备正确回答问题所需的情况:

USE [AdventureWorksDW2008]
GO

/****** Object:  Table [dbo].[DimEmployee]    Script Date: 11/22/2010 20:30:20 ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE TABLE [dbo].[DimEmployee](
    [EmployeeKey] [int] IDENTITY(1,1) NOT NULL,
    [ParentEmployeeKey] [int] NULL,
    [EmployeeNationalIDAlternateKey] [nvarchar](15) NULL,
    [ParentEmployeeNationalIDAlternateKey] [nvarchar](15) NULL,
    [SalesTerritoryKey] [int] NULL,
    [FirstName] [nvarchar](50) NOT NULL,
    [LastName] [nvarchar](50) NOT NULL,
    [MiddleName] [nvarchar](50) NULL,
    [NameStyle] [bit] NOT NULL,
    [Title] [nvarchar](50) NULL,
    [HireDate] [date] NULL,
    [BirthDate] [date] NULL,
    [LoginID] [nvarchar](256) NULL,
    [EmailAddress] [nvarchar](50) NULL,
    [Phone] [nvarchar](25) NULL,
    [MaritalStatus] [nchar](1) NULL,
    [EmergencyContactName] [nvarchar](50) NULL,
    [EmergencyContactPhone] [nvarchar](25) NULL,
    [SalariedFlag] [bit] NULL,
    [Gender] [nchar](1) NULL,
    [PayFrequency] [tinyint] NULL,
    [BaseRate] [money] NULL,
    [VacationHours] [smallint] NULL,
    [SickLeaveHours] [smallint] NULL,
    [CurrentFlag] [bit] NOT NULL,
    [SalesPersonFlag] [bit] NOT NULL,
    [DepartmentName] [nvarchar](50) NULL,
    [StartDate] [date] NULL,
    [EndDate] [date] NULL,
    [Status] [nvarchar](50) NULL,
 CONSTRAINT [PK_DimEmployee_EmployeeKey] PRIMARY KEY CLUSTERED 
(
    [EmployeeKey] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

ALTER TABLE [dbo].[DimEmployee]  WITH CHECK ADD  CONSTRAINT [FK_DimEmployee_DimEmployee] FOREIGN KEY([ParentEmployeeKey])
REFERENCES [dbo].[DimEmployee] ([EmployeeKey])
GO

ALTER TABLE [dbo].[DimEmployee] CHECK CONSTRAINT [FK_DimEmployee_DimEmployee]
GO

ALTER TABLE [dbo].[DimEmployee]  WITH CHECK ADD  CONSTRAINT [FK_DimEmployee_DimSalesTerritory] FOREIGN KEY([SalesTerritoryKey])
REFERENCES [dbo].[DimSalesTerritory] ([SalesTerritoryKey])
GO

ALTER TABLE [dbo].[DimEmployee] CHECK CONSTRAINT [FK_DimEmployee_DimSalesTerritory]
GO
Run Code Online (Sandbox Code Playgroud)

第二个表:

USE [AdventureWorksDW2008]
GO

/****** Object:  Table [dbo].[FactResellerSales]    Script Date: 11/22/2010 20:30:47 ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE TABLE [dbo].[FactResellerSales](
    [ProductKey] [int] NOT NULL,
    [OrderDateKey] [int] NOT NULL,
    [DueDateKey] [int] NOT NULL,
    [ShipDateKey] [int] NOT NULL,
    [ResellerKey] [int] NOT NULL,
    [EmployeeKey] [int] NOT NULL,
    [PromotionKey] [int] NOT NULL,
    [CurrencyKey] [int] NOT NULL,
    [SalesTerritoryKey] [int] NOT NULL,
    [SalesOrderNumber] [nvarchar](20) NOT NULL,
    [SalesOrderLineNumber] [tinyint] NOT NULL,
    [RevisionNumber] [tinyint] NULL,
    [OrderQuantity] [smallint] NULL,
    [UnitPrice] [money] NULL,
    [ExtendedAmount] [money] NULL,
    [UnitPriceDiscountPct] [float] NULL,
    [DiscountAmount] [float] NULL,
    [ProductStandardCost] [money] NULL,
    [TotalProductCost] [money] NULL,
    [SalesAmount] [money] NULL,
    [TaxAmt] [money] NULL,
    [Freight] [money] NULL,
    [CarrierTrackingNumber] [nvarchar](25) NULL,
    [CustomerPONumber] [nvarchar](25) NULL,
 CONSTRAINT [PK_FactResellerSales_SalesOrderNumber_SalesOrderLineNumber] PRIMARY KEY CLUSTERED 
(
    [SalesOrderNumber] ASC,
    [SalesOrderLineNumber] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

ALTER TABLE [dbo].[FactResellerSales]  WITH CHECK ADD  CONSTRAINT [FK_FactResellerSales_DimCurrency] FOREIGN KEY([CurrencyKey])
REFERENCES [dbo].[DimCurrency] ([CurrencyKey])
GO

ALTER TABLE [dbo].[FactResellerSales] CHECK CONSTRAINT [FK_FactResellerSales_DimCurrency]
GO

ALTER TABLE [dbo].[FactResellerSales]  WITH CHECK ADD  CONSTRAINT [FK_FactResellerSales_DimDate] FOREIGN KEY([OrderDateKey])
REFERENCES [dbo].[DimDate] ([DateKey])
GO

ALTER TABLE [dbo].[FactResellerSales] CHECK CONSTRAINT [FK_FactResellerSales_DimDate]
GO

ALTER TABLE [dbo].[FactResellerSales]  WITH CHECK ADD  CONSTRAINT [FK_FactResellerSales_DimDate1] FOREIGN KEY([DueDateKey])
REFERENCES [dbo].[DimDate] ([DateKey])
GO

ALTER TABLE [dbo].[FactResellerSales] CHECK CONSTRAINT [FK_FactResellerSales_DimDate1]
GO

ALTER TABLE [dbo].[FactResellerSales]  WITH CHECK ADD  CONSTRAINT [FK_FactResellerSales_DimDate2] FOREIGN KEY([ShipDateKey])
REFERENCES [dbo].[DimDate] ([DateKey])
GO

ALTER TABLE [dbo].[FactResellerSales] CHECK CONSTRAINT [FK_FactResellerSales_DimDate2]
GO

ALTER TABLE [dbo].[FactResellerSales]  WITH CHECK ADD  CONSTRAINT [FK_FactResellerSales_DimEmployee] FOREIGN KEY([EmployeeKey])
REFERENCES [dbo].[DimEmployee] ([EmployeeKey])
GO

ALTER TABLE [dbo].[FactResellerSales] CHECK CONSTRAINT [FK_FactResellerSales_DimEmployee]
GO

ALTER TABLE [dbo].[FactResellerSales]  WITH CHECK ADD  CONSTRAINT [FK_FactResellerSales_DimProduct] FOREIGN KEY([ProductKey])
REFERENCES [dbo].[DimProduct] ([ProductKey])
GO

ALTER TABLE [dbo].[FactResellerSales] CHECK CONSTRAINT [FK_FactResellerSales_DimProduct]
GO

ALTER TABLE [dbo].[FactResellerSales]  WITH CHECK ADD  CONSTRAINT [FK_FactResellerSales_DimPromotion] FOREIGN KEY([PromotionKey])
REFERENCES [dbo].[DimPromotion] ([PromotionKey])
GO

ALTER TABLE [dbo].[FactResellerSales] CHECK CONSTRAINT [FK_FactResellerSales_DimPromotion]
GO

ALTER TABLE [dbo].[FactResellerSales]  WITH CHECK ADD  CONSTRAINT [FK_FactResellerSales_DimReseller] FOREIGN KEY([ResellerKey])
REFERENCES [dbo].[DimReseller] ([ResellerKey])
GO

ALTER TABLE [dbo].[FactResellerSales] CHECK CONSTRAINT [FK_FactResellerSales_DimReseller]
GO

ALTER TABLE [dbo].[FactResellerSales]  WITH CHECK ADD  CONSTRAINT [FK_FactResellerSales_DimSalesTerritory] FOREIGN KEY([SalesTerritoryKey])
REFERENCES [dbo].[DimSalesTerritory] ([SalesTerritoryKey])
GO

ALTER TABLE [dbo].[FactResellerSales] CHECK CONSTRAINT [FK_FactResellerSales_DimSalesTerritory]
GO
Run Code Online (Sandbox Code Playgroud)

这个问题使用了一个非常具体的例子,但我也对一般信息感兴趣,这些信息关于何时/是否适合使用EXCEPT而不是a LEFT JOIN和check IS NULL.

我还注意到,如果我尝试运行第一个查询INTERCEPT,第二个查询中的等价物将是使用标准JOINSELECT DISTINCT DimEmployee.EmployeeKey(并且根本没有WHERE子句).但是,在这种情况下,执行计划在两种情况下完全相同.

更新
第二个查询的次要更新(请参阅修订历史记录),这会导致对该查询的更复杂的查询计划.我猜想更大的查询计划表明查询不太理想,但我仍然希望这个问题得到解答.

gbn*_*gbn 4

如果 LEFT JOIN 给出多行,这两个查询将会不同。也就是说,FactResellerSales 是 DimEmployee 的子级,DimEmployee 中每行有许多行。因此,正如您在 JOIN 示例中指出的那样,您需要 DISTINCT。

如果您将查询更改为使用 NOT EXISTS,您会得到相同的计划(左反半连接是 NOT EXISTS 的典型情况)

SELECT EmployeeKey
FROM DimEmployee DE 
WHERE
NOT EXISTS (SELECT * FROM 
        FactResellerSales FRS
    WHERE FRS.EmployeeKey = DE.EmployeeKey)
Run Code Online (Sandbox Code Playgroud)

此外,出于同样的原因,INTERSECT/EXISTS 很可能会给出相同的计划。

这是争论的另一个JOIN/EXISTS/IN方面OUTER JOIN/NOT EXISTS/NOT IN。INTERSECT/EXCEPT 是 (NOT) EXISTS 的稍微优雅的结构,你喜欢吗

编辑:

没有一个明显的问题...

就我个人而言,我不使用 OUTER JOIN 进行“存在”检查:我使用 EXISTS 或 NOT EXISTS(或者 INTERSECT/EXCEPT,如果我记得的话),因为它更明显你想要做什么。也就是说,如果我不需要“外部”表中的行,我不会使用它来避免出现 DISTINCT。

如果在这种情况下,则没有 CASE 使用 OUTER JOIN/IS NULL 恕我直言。当然,我在需要时使用 OUTER JOIN:这个答案仅适用于一种特定情况。