如何避免在 WHERE 子句中使用变量

Wil*_*Cau 16 sql-server-2008 sql-server

给定一个(简化的)存储过程,例如:

CREATE PROCEDURE WeeklyProc(@endDate DATE)
AS
BEGIN
  DECLARE @startDate DATE = DATEADD(DAY, -6, @endDate)
  SELECT
    -- Stuff
  FROM Sale
  WHERE SaleDate BETWEEN @startDate AND @endDate
END
Run Code Online (Sandbox Code Playgroud)

如果Sale表很大,SELECT可能需要很长时间才能执行,显然是因为优化器由于局部变量而无法优化。我们测试运行SELECT带有变量的部件,然后硬编码日期,执行时间从约 9 分钟变为约 1 秒。

我们有许多基于“固定”日期范围(周、月、8 周等)进行查询的存储过程,因此输入参数只是 @endDate,@startDate 是在过程中计算的。

问题是,在 WHERE 子句中避免变量以免损害优化器的最佳做法是什么?

我们提出的可能性如下所示。是否有任何这些最佳实践,或者还有其他方法吗?

使用包装程序将变量转换为参数。

参数不会像局部变量那样影响优化器。

CREATE PROCEDURE WeeklyProc(@endDate DATE)
AS
BEGIN
   DECLARE @startDate DATE = DATEADD(DAY, -6, @endDate)
   EXECUTE DateRangeProc @startDate, @endDate
END

CREATE PROCEDURE DateRangeProc(@startDate DATE, @endDate DATE)
AS
BEGIN
  SELECT
    -- Stuff
  FROM Sale
  WHERE SaleDate BETWEEN @startDate AND @endDate
END
Run Code Online (Sandbox Code Playgroud)

使用参数化动态 SQL。

CREATE PROCEDURE WeeklyProc(@endDate DATE)
AS
BEGIN
  DECLARE @startDate DATE = DATEADD(DAY, -6, @endDate)
  DECLARE @sql NVARCHAR(4000) = N'
    SELECT
      -- Stuff
    FROM Sale
    WHERE SaleDate BETWEEN @startDate AND @endDate
  '
  DECLARE @param NVARCHAR(4000) = N'@startDate DATE, @endDate DATE'
  EXECUTE sp_executesql @sql, @param, @startDate = @startDate, @endDate = @endDate
END
Run Code Online (Sandbox Code Playgroud)

使用“硬编码”动态 SQL。

CREATE PROCEDURE WeeklyProc(@endDate DATE)
AS
BEGIN
  DECLARE @startDate DATE = DATEADD(DAY, -6, @endDate)
  DECLARE @sql NVARCHAR(4000) = N'
    SELECT
      -- Stuff
    FROM Sale
    WHERE SaleDate BETWEEN @startDate AND @endDate
  '
  SET @sql = REPLACE(@sql, '@startDate', CONVERT(NCHAR(10), @startDate, 126))
  SET @sql = REPLACE(@sql, '@endDate', CONVERT(NCHAR(10), @endDate, 126))
  EXECUTE sp_executesql @sql
END
Run Code Online (Sandbox Code Playgroud)

DATEADD()直接使用该功能。

我并不热衷于此,因为在 WHERE 中调用函数也会影响性能。

CREATE PROCEDURE WeeklyProc(@endDate DATE)
AS
BEGIN
  SELECT
    -- Stuff
  FROM Sale
  WHERE SaleDate BETWEEN DATEADD(DAY, -6, @endDate) AND @endDate
END
Run Code Online (Sandbox Code Playgroud)

使用可选参数。

我不确定分配给参数是否与分配给变量有同样的问题,所以这可能不是一个选项。我不太喜欢这个解决方案,但为了完整性而包括它。

CREATE PROCEDURE WeeklyProc(@endDate DATE, @startDate DATE = NULL)
AS
BEGIN
  SET @startDate = DATEADD(DAY, -6, @endDate)
  SELECT
    -- Stuff
  FROM Sale
  WHERE SaleDate BETWEEN @startDate AND @endDate
END
Run Code Online (Sandbox Code Playgroud)

- 更新 -

感谢您的建议和意见。阅读它们后,我使用各种方法进行了一些计时测试。我在这里添加结果作为参考。

运行 1 没有计划。运行 2 紧跟在运行 1 之后,参数完全相同,因此它将使用运行 1 的计划。

NoProc 时间用于在存储过程之外的 SSMS 中手动运行 SELECT 查询。

TestProc1-7 是来自原始问题的查询。

TestProcA-B 基于Mikael Eriksson的建议。数据库中的列是 DATE,因此我尝试将参数作为 DATETIME 传递并使用隐式转换 (testProcA) 和显式转换 (testProcB) 运行。

TestProcC-D 基于Kenneth Fisher的建议。我们已经将日期查找表用于其他用途,但我们没有针对每个时期范围的特定列。我尝试的变体仍然使用 BETWEEN 但在较小的查找表上进行并连接到较大的表。我将进一步调查我们是否可以使用特定的查找表,尽管我们的时间段是固定的,但有很多不同的时间段。

    Sale 表中的总行数:136,424,366

                       运行 1 (ms) 运行 2 (ms)
    程序 CPU Elapsed CPU Elapsed 注释
    NoProc 常量 6567 62199 2870 719 手动查询常量
    NoProc 变量 9314 62424 3993 998 带变量的手动查询
    testProc1 6801 62919 2871 736 硬编码范围
    testProc2 8955 63190 3915 979 参数和变量范围
    testProc3 8985 63152 3932 987 带参数范围的包装程序
    testProc4 9142 63939 3931 977 参数化动态 SQL
    testProc5 7269 62933 2933 728 硬编码动态 SQL
    testProc6 9266 63421 3915 984 在 DATE 上使用 DATEADD
    testProc7 2044 13950 1092 1087 虚拟参数
    testProcA 12120 61493 5491 1875 在 DATETIME 上使用 DATEADD 而不使用 CAST
    testProcB 8612 61949 3932 978 在 DATETIME 上使用 DATEADD 和 CAST
    testProcC 8861 61651 3917 993 使用查找表,先销售
    testProcD 8625 61740 3994 1031 使用查找表,最后销售

这是测试代码。

------ SETUP ------

IF OBJECT_ID(N'testDimDate', N'U') IS NOT NULL DROP TABLE testDimDate
IF OBJECT_ID(N'testProc1', N'P') IS NOT NULL DROP PROCEDURE testProc1
IF OBJECT_ID(N'testProc2', N'P') IS NOT NULL DROP PROCEDURE testProc2
IF OBJECT_ID(N'testProc3', N'P') IS NOT NULL DROP PROCEDURE testProc3
IF OBJECT_ID(N'testProc3a', N'P') IS NOT NULL DROP PROCEDURE testProc3a
IF OBJECT_ID(N'testProc4', N'P') IS NOT NULL DROP PROCEDURE testProc4
IF OBJECT_ID(N'testProc5', N'P') IS NOT NULL DROP PROCEDURE testProc5
IF OBJECT_ID(N'testProc6', N'P') IS NOT NULL DROP PROCEDURE testProc6
IF OBJECT_ID(N'testProc7', N'P') IS NOT NULL DROP PROCEDURE testProc7
IF OBJECT_ID(N'testProcA', N'P') IS NOT NULL DROP PROCEDURE testProcA
IF OBJECT_ID(N'testProcB', N'P') IS NOT NULL DROP PROCEDURE testProcB
IF OBJECT_ID(N'testProcC', N'P') IS NOT NULL DROP PROCEDURE testProcC
IF OBJECT_ID(N'testProcD', N'P') IS NOT NULL DROP PROCEDURE testProcD
GO

CREATE TABLE testDimDate
(
   DateKey DATE NOT NULL,
   CONSTRAINT PK_DimDate_DateKey UNIQUE NONCLUSTERED (DateKey ASC)
)
GO

DECLARE @dateTimeStart DATETIME = '2000-01-01'
DECLARE @dateTimeEnd DATETIME = '2100-01-01'
;WITH CTE AS
(
   --Anchor member defined
   SELECT @dateTimeStart FullDate
   UNION ALL
   --Recursive member defined referencing CTE
   SELECT FullDate + 1 FROM CTE WHERE FullDate + 1 <= @dateTimeEnd
)
SELECT
   CAST(FullDate AS DATE) AS DateKey
INTO #DimDate
FROM CTE
OPTION (MAXRECURSION 0)

INSERT INTO testDimDate (DateKey)
SELECT DateKey FROM #DimDate ORDER BY DateKey ASC

DROP TABLE #DimDate
GO

-- Hard coded date range.
CREATE PROCEDURE testProc1 AS
BEGIN
   SET NOCOUNT ON
   SELECT SUM(Value) FROM Sale WHERE SaleDate BETWEEN '2012-12-09' AND '2012-12-10'
END
GO

-- Parameter and variable date range.
CREATE PROCEDURE testProc2(@endDate DATE) AS
BEGIN
   SET NOCOUNT ON
   DECLARE @startDate DATE = DATEADD(DAY, -1, @endDate)
   SELECT SUM(Value) FROM Sale WHERE SaleDate BETWEEN @startDate AND @endDate
END
GO

-- Parameter date range.
CREATE PROCEDURE testProc3a(@startDate DATE, @endDate DATE) AS
BEGIN
   SET NOCOUNT ON
   SELECT SUM(Value) FROM Sale WHERE SaleDate BETWEEN @startDate AND @endDate
END
GO

-- Wrapper procedure.
CREATE PROCEDURE testProc3(@endDate DATE) AS
BEGIN
   SET NOCOUNT ON
   DECLARE @startDate DATE = DATEADD(DAY, -1, @endDate)
   EXEC testProc3a @startDate, @endDate
END
GO

-- Parameterized dynamic SQL.
CREATE PROCEDURE testProc4(@endDate DATE) AS
BEGIN
   SET NOCOUNT ON
   DECLARE @startDate DATE = DATEADD(DAY, -1, @endDate)
   DECLARE @sql NVARCHAR(4000) = N'SELECT SUM(Value) FROM Sale WHERE SaleDate BETWEEN @startDate AND @endDate'
   DECLARE @param NVARCHAR(4000) = N'@startDate DATE, @endDate DATE'
   EXEC sp_executesql @sql, @param, @startDate = @startDate, @endDate = @endDate
END
GO

-- Hard coded dynamic SQL.
CREATE PROCEDURE testProc5(@endDate DATE) AS
BEGIN
   SET NOCOUNT ON
   DECLARE @startDate DATE = DATEADD(DAY, -1, @endDate)
   DECLARE @sql NVARCHAR(4000) = N'SELECT SUM(Value) FROM Sale WHERE SaleDate BETWEEN ''@startDate'' AND ''@endDate'''
   SET @sql = REPLACE(@sql, '@startDate', CONVERT(NCHAR(10), @startDate, 126))
   SET @sql = REPLACE(@sql, '@endDate', CONVERT(NCHAR(10), @endDate, 126))
   EXEC sp_executesql @sql
END
GO

-- Explicitly use DATEADD on a DATE.
CREATE PROCEDURE testProc6(@endDate DATE) AS
BEGIN
   SET NOCOUNT ON
   SELECT SUM(Value) FROM Sale WHERE SaleDate BETWEEN DATEADD(DAY, -1, @endDate) AND @endDate
END
GO

-- Dummy parameter.
CREATE PROCEDURE testProc7(@endDate DATE, @startDate DATE = NULL) AS
BEGIN
   SET NOCOUNT ON
   SET @startDate = DATEADD(DAY, -1, @endDate)
   SELECT SUM(Value) FROM Sale WHERE SaleDate BETWEEN @startDate AND @endDate
END
GO

-- Explicitly use DATEADD on a DATETIME with implicit CAST for comparison with SaleDate.
-- Based on the answer from Mikael Eriksson.
CREATE PROCEDURE testProcA(@endDateTime DATETIME) AS
BEGIN
   SET NOCOUNT ON
   SELECT SUM(Value) FROM Sale WHERE SaleDate BETWEEN DATEADD(DAY, -1, @endDateTime) AND @endDateTime
END
GO

-- Explicitly use DATEADD on a DATETIME but CAST to DATE for comparison with SaleDate.
-- Based on the answer from Mikael Eriksson.
CREATE PROCEDURE testProcB(@endDateTime DATETIME) AS
BEGIN
   SET NOCOUNT ON
   SELECT SUM(Value) FROM Sale WHERE SaleDate BETWEEN CAST(DATEADD(DAY, -1, @endDateTime) AS DATE) AND CAST(@endDateTime AS DATE)
END
GO

-- Use a date lookup table, Sale first.
-- Based on the answer from Kenneth Fisher.
CREATE PROCEDURE testProcC(@endDate DATE) AS
BEGIN
   SET NOCOUNT ON
   DECLARE @startDate DATE = DATEADD(DAY, -1, @endDate)
   SELECT SUM(Value) FROM Sale J INNER JOIN testDimDate D ON D.DateKey = J.SaleDate WHERE D.DateKey BETWEEN @startDate AND @endDate
END
GO

-- Use a date lookup table, Sale last.
-- Based on the answer from Kenneth Fisher.
CREATE PROCEDURE testProcD(@endDate DATE) AS
BEGIN
   SET NOCOUNT ON
   DECLARE @startDate DATE = DATEADD(DAY, -1, @endDate)
   SELECT SUM(Value) FROM testDimDate D INNER JOIN Sale J ON J.SaleDate = D.DateKey WHERE D.DateKey BETWEEN @startDate AND @endDate
END
GO

------ TEST ------

SET STATISTICS TIME OFF

DECLARE @endDate DATE = '2012-12-10'
DECLARE @startDate DATE = DATEADD(DAY, -1, @endDate)

DBCC FREEPROCCACHE WITH NO_INFOMSGS
DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS

RAISERROR('Run 1: NoProc with constants', 0, 0) WITH NOWAIT
SET STATISTICS TIME ON
SELECT SUM(Value) FROM Sale WHERE SaleDate BETWEEN '2012-12-09' AND '2012-12-10'
SET STATISTICS TIME OFF

RAISERROR('Run 2: NoProc with constants', 0, 0) WITH NOWAIT
SET STATISTICS TIME ON
SELECT SUM(Value) FROM Sale WHERE SaleDate BETWEEN '2012-12-09' AND '2012-12-10'
SET STATISTICS TIME OFF

DBCC FREEPROCCACHE WITH NO_INFOMSGS
DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS

RAISERROR('Run 1: NoProc with variables', 0, 0) WITH NOWAIT
SET STATISTICS TIME ON
SELECT SUM(Value) FROM Sale WHERE SaleDate BETWEEN @startDate AND @endDate
SET STATISTICS TIME OFF

RAISERROR('Run 2: NoProc with variables', 0, 0) WITH NOWAIT
SET STATISTICS TIME ON
SELECT SUM(Value) FROM Sale WHERE SaleDate BETWEEN @startDate AND @endDate
SET STATISTICS TIME OFF

DECLARE @sql NVARCHAR(4000)

DECLARE _cursor CURSOR LOCAL FAST_FORWARD FOR
   SELECT
      procedures.name,
      procedures.object_id
   FROM sys.procedures
   WHERE procedures.name LIKE 'testProc_'
   ORDER BY procedures.name ASC

OPEN _cursor

DECLARE @name SYSNAME
DECLARE @object_id INT

FETCH NEXT FROM _cursor INTO @name, @object_id
WHILE @@FETCH_STATUS = 0
BEGIN
   SET @sql = CASE (SELECT COUNT(*) FROM sys.parameters WHERE object_id = @object_id)
      WHEN 0 THEN @name
      WHEN 1 THEN @name + ' ''@endDate'''
      WHEN 2 THEN @name + ' ''@startDate'', ''@endDate'''
   END

   SET @sql = REPLACE(@sql, '@name', @name)
   SET @sql = REPLACE(@sql, '@startDate', CONVERT(NVARCHAR(10), @startDate, 126))
   SET @sql = REPLACE(@sql, '@endDate', CONVERT(NVARCHAR(10), @endDate, 126))

   DBCC FREEPROCCACHE WITH NO_INFOMSGS
   DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS

   RAISERROR('Run 1: %s', 0, 0, @sql) WITH NOWAIT
   SET STATISTICS TIME ON
   EXEC sp_executesql @sql
   SET STATISTICS TIME OFF

   RAISERROR('Run 2: %s', 0, 0, @sql) WITH NOWAIT
   SET STATISTICS TIME ON
   EXEC sp_executesql @sql
   SET STATISTICS TIME OFF

   FETCH NEXT FROM _cursor INTO @name, @object_id
END

CLOSE _cursor
DEALLOCATE _cursor
Run Code Online (Sandbox Code Playgroud)

Mik*_*son 9

参数嗅探几乎在所有时间都是您的朋友,您应该编写查询以便可以使用它。参数嗅探有助于使用编译查询时可用的参数值为您构建计划。参数嗅探的阴暗面是编译查询时使用的值对于即将到来的查询不是最佳的。

存储过程中的查询是在执行存储过程时编译的,而不是在执行查询时编译,因此 SQL Server 必须在此处处理值...

CREATE PROCEDURE WeeklyProc(@endDate DATE)
AS
BEGIN
  DECLARE @startDate DATE = DATEADD(DAY, -6, @endDate)
  SELECT
    -- Stuff
  FROM Sale
  WHERE SaleDate BETWEEN @startDate AND @endDate
END
Run Code Online (Sandbox Code Playgroud)

是 的已知值@endDate和未知值@startDate。这将使 SQL Server 猜测过滤器返回的 30% 的行@startDate以及统计信息告诉它的任何内容@endDate。如果您有一个包含大量行的大表,可以为您提供扫描操作,您将从查找中受益最大。

你的包装过程的解决方案可以确保SQL Server的看到值时,DateRangeProc被编译,以便它可以使用两个已知值@endDate@startDate

您的动态查询都会导致相同的事情,这些值在编译时是已知的。

具有默认空值的那个有点特殊。在编译时已知的SQL Server的值是一个已知的价值@endDatenull@startDate。使用 a nullin a between 将为您提供 0 行,但 SQL Server 在这些情况下总是猜测为 1。在这种情况下,这可能是一件好事,但如果您调用具有较大日期间隔的存储过程,其中扫描将是最佳选择,它最终可能会执行一堆搜索。

我在这个答案的末尾留下了“直接使用 DATEADD() 函数”,因为我会使用它,而且它也有一些奇怪的地方。

首先,在 where 子句中使用该函数时,SQL Server 不会多次调用该函数。DATEADD 被认为是运行时常量

而且我认为DATEADD在编译查询时会对其进行评估,以便您可以很好地估计返回的行数。但在这种情况下并非如此。
SQL Server 根据参数中的值进行估计,而不管您使用什么DATEADD(在 SQL Server 2012 上测试),因此在您的情况下,估计值将是在 上注册的行数@endDate。我不知道为什么会这样,但这与 datatype 的使用有关DATE。转移到DATETIME存储过程和表中并估计将是准确的,这意味着DATEADD在编译时考虑为DATETIMEnot for DATE

所以总结这个相当冗长的答案,我会推荐包装程序解决方案。它将始终允许 SQL Server 在编译查询时使用提供的值,而无需使用动态 SQL。

PS:

在评论中,您有两个建议。

OPTION (OPTIMIZE FOR UNKNOWN)将为您提供 9% 的返回行的估计值,OPTION (RECOMPILE)并使 SQL Server 看到参数值,因为每次都重新编译查询。