加速对 1100 万行表的聚合查询

hom*_*742 6 performance sql-server sql-server-2012 query-performance

我有一个要加速的查询:

SELECT 
  sum(case when FlagDTD = 1 then Success else 0 end)   as SuccessDTD
, sum(case when FlagDTD = 1 then [Error] else 0 end)   as ErrorDTD
, round(sum(case when FlagDTD = 1 then Success else 0 end) * 100.0 / sum(FlagDTD),2) 
    as RateDTD
, sum(case when FlagYTD = 1 then Success else 0 end)   as SuccessYTD
, sum(case when FlagYTD = 1 then [Error] else 0 end)   as ErrorYTD
, round(sum(case when FlagYTD = 1 then Success else 0 end) * 100.0 / sum(FlagYTD),2)  
    as RateYTD
FROM
(
    SELECT 
      CASE WHEN Message = 'OK'  then 1 else 0 end as Success
    , CASE WHEN Message <> 'OK'  then 1 else 0 end as [Error]    
    , CASE WHEN DateCreated > 
      dateadd(HOUR, datediff(hh,GetUTCDate(), GetDate())*-1,  DATEADD(yy,
        DATEDIFF(yy,0,getdate()), 0)) then 1 else 0 end as FlagYTD
    , CASE WHEN DateCreated > 
      dateadd(HOUR, datediff(hh,GetUTCDate(), GetDate())*-1 , 
        convert(varchar(10), getdate(), 101)) then 1 else 0 end as FlagDTD
    FROM
      [Channels4].[dbo].[NotificationResult]
) Cnts
Run Code Online (Sandbox Code Playgroud)

我想也许我可以基于子查询创建一个视图或索引视图。但是,在测试时我无法创建索引视图,因为“视图使用从字符串到日期时间或小日期时间的隐式转换”。

我尝试使用传统视图,但这根本没有提高性能。我的下一个想法可能是重写整个查询。每个人的想法是什么?

计划:

https://www.brentozar.com/pastetheplan/?id=rJqGY7iZW

表结构:

CREATE TABLE [dbo].[NotificationResult]
(
    [IdNotificationResult] [bigint] IDENTITY(1,1) NOT NULL,
    [ApplicationGuid] [nvarchar](48) NOT NULL,
    [MessageGuid] [nvarchar](48) NOT NULL,
    [IdNotificationResultTypeStatus] [int] NOT NULL,
    [MessageStatusCode] [int] NULL,
    [Message] [varchar](max) NULL,
    [ExceptionStatusCode] [int] NULL,
    [ExceptionMessage] [varchar](max) NULL,
    [Subject] [varchar](max) NULL,
    [From] [varchar](max) NULL,
    [Timestamp] [datetime] NOT NULL,
    [IdCreatedBy] [bigint] NOT NULL,
    [IdLastUpdatedBy] [bigint] NOT NULL,
    [DateCreated] [datetime] NOT NULL,
    [DateLastUpdated] [datetime] NOT NULL,
  CONSTRAINT [PK_NotificationResult] PRIMARY KEY CLUSTERED 
  (
    [IdNotificationResult] 
  )
);

CREATE NONCLUSTERED INDEX [IX_NotificationResult_DateMessage] 
  ON [dbo].[NotificationResult] ( [DateCreated] ASC ) INCLUDE ( [Message]);
Run Code Online (Sandbox Code Playgroud)

快速计算 YTD 和 DTD,我得到两个数字:11739267。“OK”的行数:11782564。

Joe*_*ish 11

正如所写的那样,您在技术上还没有在这里提问。我假设您想提高查询的性能,但请记住,定义可接受的响应时间有时是性能调整的重要部分。如果查询每天运行一次并且需要一分钟才能完成,那么让它在 1 秒内运行是否真的值得您花费 8 小时的时间?

比性能更重要的是正确性。如果查询返回错误结果,查询花费多长时间并不重要,尽管花费很长时间返回错误结果当然比花费很短时间返回错误结果更糟糕。根据您所在的时区,UTC 转换内容可能无法按您的预期进行。如果有任何受夏令时影响的数据,则不能使用本地时间和 UTC 时间之间的当前小时差来转换旧数据。

将所有这些放在一边,我将尝试向您展示一些方法来加快问题中的查询速度。您有一个覆盖索引,这是一个好的开始,特别是因为它避免了读取不相关的 blob 数据。但是,仍有一些方法可以加快查询速度。我故意忽略您提供的有关数据分布的线索,因为我想让它成为一个更一般的答案,可以帮助其他人,并且如果您的数据将来发生变化,也许对您更有帮助。

我模拟了 1000 万行,其中一半对消息表示“OK”,另一半有很长的字符串。日期分布在几年内。警告:这段代码占用了大约 60 GB 的空间,在我的机器上运行了大约 10 分钟。

CREATE TABLE [dbo].[NotificationResult]
(
    [IdNotificationResult] [bigint] IDENTITY(1,1) NOT NULL,
    [Message] [varchar](max) NULL,
    [DateCreated] [datetime] NOT NULL,
    [DateCreatedUTC] [date] NOT NULL,
    [Filler] VARCHAR(1000) NOT NULL,
CONSTRAINT [PK_NotificationResult] PRIMARY KEY CLUSTERED 
  (
    [IdNotificationResult] 
  )
);

INSERT INTO [dbo].[NotificationResult] WITH (TABLOCK) ([Message], [DateCreated], [DateCreatedUTC], [Filler])
SELECT CASE WHEN RN % 2 = 1 THEN 'OK' ELSE REPLICATE('Z', 3000) END
, DATEADD(SECOND, 11 * RN, '20140101')
, CAST(DATEADD(SECOND, 11 * RN, '20140101') AS DATE)
, REPLICATE('FILLER', 166)
FROM
(
    SELECT TOP (10000000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) RN
    FROM master..spt_values t1
    CROSS JOIN master..spt_values t2
    CROSS JOIN master..spt_values t3
) t;

CREATE NONCLUSTERED INDEX [IX_NotificationResult_DateMessage] 
  ON [dbo].[NotificationResult] ( [DateCreated] ASC ) INCLUDE ( [Message]);
Run Code Online (Sandbox Code Playgroud)

如果我在问题中运行查询,我会得到与您相同的查询计划。花了 38 秒。以下是执行的一些性能统计信息:

表“通知结果”。扫描计数 5,逻辑读取 2509562,物理读取 0,预读读取 2499799

SQL Server 执行时间:CPU 时间 = 20626 毫秒,已用时间 = 37663 毫秒。

提高性能的第一个机会是WHERE您的CASE语句中有一个隐含的子句谓词。查询优化器不够聪明,无法意识到前几年的任何行都不会对总数做出贡献。我们知道 24 小时总是比本地时间和 UTC 时间之间的时差长,因此添加这样的过滤器不应改变结果:

WHERE DateCreated > DATEADD(DAY, -1, dateadd(YEAR, datediff(YEAR, 0, getdate()), 0))
Run Code Online (Sandbox Code Playgroud)

现在,SQL Server 只需处理 140 万行,而不是从索引中读取和聚合 1000 万行。通过此优化获得的节省将取决于数据在计划中的分布方式。如果您的所有数据都在当年,则性能还不会提高。对于我的数据,查询现在在 5 秒内完成并且性能大大提高:

表“通知结果”。扫描计数 5,逻辑读取 352033,物理读取 1,预读读取 350073

SQL Server 执行时间:CPU 时间 = 3062 毫秒,已用时间 = 5354 毫秒。

第一个计划

我们可以做得更好。VARCHAR(MAX)当我们真正需要知道的是列值是否匹配“OK”时,我们正在索引中存储一列。在不改变表定义的情况下,我们可以通过创建三个过滤索引来创建更小的索引来查找或扫描:

CREATE NONCLUSTERED INDEX [IX_NotificationResult_Date_OK] 
  ON [dbo].[NotificationResult] ( [DateCreated] ASC )
  WHERE [Message] = 'OK';

CREATE NONCLUSTERED INDEX [IX_NotificationResult_Date_NOT_OK] 
  ON [dbo].[NotificationResult] ( [DateCreated] ASC )
  WHERE [Message] <> 'OK';

CREATE NONCLUSTERED INDEX [IX_NotificationResult_Date_NULL] 
  ON [dbo].[NotificationResult] ( [DateCreated] ASC )
  WHERE [Message] IS NULL;
Run Code Online (Sandbox Code Playgroud)

这里的想法是这些索引具有我们需要的数据,但在磁盘上比现有IX_NotificationResult_DateMessage索引小得多。让查询优化器使用过滤的索引需要一个查询重写和一个索引提示(不知道为什么)。这是重写查询的一种方法:

SELECT 
  sum(case when FlagDTD = 1 then Success else 0 end)   as SuccessDTD
, sum(case when FlagDTD = 1 then [Error] else 0 end)   as ErrorDTD
, round(sum(case when FlagDTD = 1 then Success else 0 end) * 100.0 / sum(FlagDTD),2) 
    as RateDTD
, sum(case when FlagYTD = 1 then Success else 0 end)   as SuccessYTD
, sum(case when FlagYTD = 1 then [Error] else 0 end)   as ErrorYTD
, round(sum(case when FlagYTD = 1 then Success else 0 end) * 100.0 / sum(FlagYTD),2)  
    as RateYTD
FROM
(
    SELECT 
      Success
    , [Error]    
    , CASE WHEN DateCreated > 
      dateadd(HOUR, datediff(hh,GetUTCDate(), GetDate())*-1,  DATEADD(yy,
        DATEDIFF(yy,0,getdate()), 0)) then 1 else 0 end as FlagYTD
    , CASE WHEN DateCreated > 
      dateadd(HOUR, datediff(hh,GetUTCDate(), GetDate())*-1 , 
        convert(varchar(10), getdate(), 101)) then 1 else 0 end as FlagDTD
FROM
    (
    SELECT 1 Success, 0 Error, DateCreated 
    FROM
    [dbo].[NotificationResult] WITH (INDEX (IX_NotificationResult_Date_OK))
    WHERE DateCreated > DATEADD(DAY, -1, dateadd(YEAR, datediff(YEAR, 0, getdate()), 0))
    AND [Message] = 'OK'

    UNION ALL

    SELECT 0 Success, 1 Error, DateCreated 
    FROM
    [dbo].[NotificationResult] WITH (INDEX (IX_NotificationResult_Date_NOT_OK))
    WHERE DateCreated > DATEADD(DAY, -1, dateadd(YEAR, datediff(YEAR, 0, getdate()), 0))
    AND [Message] <> 'OK'

    UNION ALL

    SELECT 0 Success, 1 Error, DateCreated 
    FROM
    [dbo].[NotificationResult] WITH (INDEX (IX_NotificationResult_Date_NULL))
    WHERE DateCreated > DATEADD(DAY, -1, dateadd(YEAR, datediff(YEAR, 0, getdate()), 0))
    AND [Message] IS NULL
    ) t
) Cnts;
Run Code Online (Sandbox Code Playgroud)

现在查询在不到一秒的时间内完成:

表“通知结果”。扫描计数 10,逻辑读取 3874,物理读取 0,预读读取 0

SQL Server 执行时间:CPU 时间 = 2499 毫秒,已用时间 = 890 毫秒。

(下面的计划缺少索引之一)

第二个计划

确实,我们从索引中读取的行比以前多,但索引总共比原始索引小 100 倍左右。

如果该查询仍然不够快,您可以考虑使用索引视图。如果表中有一个 UTC 日期列,那么创建一个可以被索引的视图很简单:

CREATE VIEW [NotificationResult_indexed]
WITH SCHEMABINDING
AS
SELECT
 [DateCreatedUTC]
, COUNT_BIG(*) AS CNT_BIG
, SUM(CASE WHEN Message = 'OK'  then 1 else 0 end) as Success
, SUM(CASE WHEN Message IS NULL OR Message <> 'OK'  then 1 else 0 end) as [Error]   
FROM dbo.[NotificationResult]
GROUP BY [DateCreatedUTC];

CREATE UNIQUE CLUSTERED INDEX CLU_NotificationResult_indexed   
    ON [NotificationResult_indexed] ([DateCreatedUTC]);  
GO  
Run Code Online (Sandbox Code Playgroud)

我相信这个查询粗略地捕捉到了你的意图,尽管我可能弄错了一些细节:

SELECT 
  sum(case when FlagDTD = 1 then Success else 0 end)   as SuccessDTD
, sum(case when FlagDTD = 1 then [Error] else 0 end)   as ErrorDTD
, round(sum(case when FlagDTD = 1 then Success else 0 end) * 100.0 / sum(FlagDTD),2) 
    as RateDTD
, sum(case when FlagYTD = 1 then Success else 0 end)   as SuccessYTD
, sum(case when FlagYTD = 1 then [Error] else 0 end)   as ErrorYTD
, round(sum(case when FlagYTD = 1 then Success else 0 end) * 100.0 / sum(FlagYTD),2)  
    as RateYTD
FROM
(
    SELECT 
      Success
    , [Error]    
    , CASE WHEN [DateCreatedUTC] > dateadd(YEAR, datediff(YEAR, 0, getdate()), 0)
       then 1 else 0 end as FlagYTD
    , CASE WHEN [DateCreatedUTC] > CAST(GETDATE() AS DATE)
       then 1 else 0 end as FlagDTD
    FROM
      [dbo].[NotificationResult_indexed]
      WHERE [DateCreatedUTC] > dateadd(YEAR, datediff(YEAR, 0, getdate()), 0)
) Cnts;
Run Code Online (Sandbox Code Playgroud)

它在 66 毫秒内完成:

表'NotificationResult_indexed'。扫描计数 1,逻辑读取 7,物理读取 0,预读读取 0

SQL Server 执行时间:CPU 时间 = 0 毫秒,已用时间 = 66 毫秒。

第三个计划