完全相同的查询 - 不同的性能

Gar*_*rom 4 performance sql-server sql-server-2014 query-performance

数据库:SQL Server 12.0.5207

除了过滤条件之一的值外,我有几个查询在所有方面都完全相同。同一张表(不是另一台服务器上的模式副本),因此索引、资源等都是相同的。一切都是完全相同的。

此查询在一秒钟内运行:

SELECT 
         MAX(MessageID) AS [MaxID]
FROM BoothComm.UniversalMessageQueue
WHERE 
       MessagePlatform = 'linux'
Run Code Online (Sandbox Code Playgroud)

此查询在一秒钟内运行:

SELECT 
         MAX(MessageID) AS [MaxID]
FROM BoothComm.UniversalMessageQueue
WHERE 
       MessagePlatform = 'linux'
       AND
       MessageCategory = 'accounting'
Run Code Online (Sandbox Code Playgroud)

此查询在一秒钟内运行:

SELECT 
         MAX(MessageID) AS [MaxID]
FROM BoothComm.UniversalMessageQueue
WHERE 
       MessagePlatform = 'windows'
Run Code Online (Sandbox Code Playgroud)

那么为什么这个运行需要将近 30 秒呢?

SELECT 
         MAX(MessageID) AS [MaxID]
FROM BoothComm.UniversalMessageQueue
WHERE 
       MessagePlatform = 'windows'
       AND
       MessageCategory = 'accounting'
Run Code Online (Sandbox Code Playgroud)

我的一个同事在表中添加了另一个索引,解决了延迟的业务问题。该索引将 30 秒减少到 FULL 秒,同时将其他查询加速到瞬时。同样,执行计划完全相同

(索引扫描应为 100%)。我听取了其他论坛的建议,并确保查询中的列顺序与它们存储在索引中的顺序相匹配...

CREATE NONCLUSTERED INDEX [MessageID and Platform and Category] ON [BoothComm].[UniversalMessageQueue]
(
    [MessageID] ASC,
    [MessagePlatform] ASC,
    [MessageCategory] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
Run Code Online (Sandbox Code Playgroud)

如果有帮助,我还会提供表模式。

CREATE TABLE [BoothComm].[UniversalMessageQueue](
    [MessageQueueId] [bigint] IDENTITY(1,1) NOT NULL,
    [MessageID] [bigint] NOT NULL,
    [MessagePlatform] [nvarchar](50) NULL,
    [AssetNumber] [nvarchar](50) NOT NULL,
    [MessageState] [int] NULL,
    [MessageStateLabel] [nvarchar](50) NULL,
    [MessageType] [int] NULL,
    [MessageTypeLabel] [nvarchar](50) NULL,
    [MessageCategory] [nvarchar](50) NULL,
    [MessageSource] [int] NULL,
    [MessageSourceLabel] [nvarchar](50) NULL,
    [MessageSourceSerialNumber] [nvarchar](50) NULL,
    [MessageCreateDate] [datetime] NULL,
    [MessageTransmitDate] [datetime] NULL,
    [MessageReceivedDate] [datetime] NULL,
    [MessageStoredDate] [datetime] NULL,
    [XMLPayload] [nvarchar](max) NULL,
    [JSONPayload] [nvarchar](max) NULL,
    [SemanticXML] [nvarchar](max) NULL,
    [SemanticJSON] [nvarchar](max) NULL,
    [MessageSequenceNumber] [int] NULL,
    [ERPImportDate] [datetime] NULL,
    [ERPImportStatus] [int] NULL,
    [ERPMsg] [nvarchar](max) NULL,
    [NormalizationDate] [datetime] NULL,
    [NormalizationStatus] [int] NULL,
    [NormalizationDesc] [nvarchar](max) NULL,
    [SemanticDate] [datetime] NULL,
    [SemanticStatus] [int] NULL,
    [SemanticDesc] [nvarchar](max) NULL,
    [CreatedDate] [datetime] NOT NULL DEFAULT (getdate()),
    [CreatedBy] [nvarchar](50) NOT NULL DEFAULT ('ETL'),
    [UpdatedDate] [datetime] NOT NULL DEFAULT (getdate()),
    [UpdatedBy] [nvarchar](50) NOT NULL DEFAULT ('ETL'),
    [ETL_ID] [uniqueidentifier] NULL,
PRIMARY KEY CLUSTERED 
(
    [MessageQueueId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY],
 CONSTRAINT [CK_ETL_Unique_MessageID_Platform] UNIQUE NONCLUSTERED 
(
    [MessageID] ASC,
    [MessagePlatform] ASC,
    [MessageType] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

GO
Run Code Online (Sandbox Code Playgroud)

这应该为您提供了足够的代码来重现问题......只需用大约 1100 万条记录填充表,您就可以看到问题!


由于它被提出了几次,我什至没想过要检查,我查看了有多少“windows”记录与“linux”记录。

SELECT COUNT(*) 
FROM BoothComm.UniversalMessageQueue 
WHERE 
    MessageCategory = 'Accounting'
    AND
    MessagePlatform = 'linux';
-- returned 1762461

SELECT COUNT(*) 
FROM BoothComm.UniversalMessageQueue 
WHERE 
    MessageCategory = 'Accounting'
    AND
    MessagePlatform = 'windows';
-- returned 11786
Run Code Online (Sandbox Code Playgroud)

所以......我猜记录计数不是问题?

ind*_*iri 8

现在您正在进行索引扫描,因为所需的所有列都在索引中,并且比扫描表更快。但是,它必须扫描,而不是搜索,因为索引上的第一列不在WHERE语句中并且不限制返回。您首先需要重新排列索引列MessagePlatform,因为它始终在您的WHERE语句中。

根据您的数据大小和所需的插入速度,您可能需要考虑两个索引。如果您只想要一个索引,我会采用以下方法:

CREATE NONCLUSTERED INDEX [MessageID and Platform and Category] 
ON [BoothComm].[UniversalMessageQueue]
(
    [MessagePlatform] ASC,
    [MessageID] ASC,
    [MessageCategory] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF,
 DROP_EXISTING = OFF, ONLINE = OFF, 
 ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
Run Code Online (Sandbox Code Playgroud)

但如果你能负担得起两个,我会转向:

CREATE NONCLUSTERED INDEX [MessageID and Platform] 
ON [BoothComm].[UniversalMessageQueue]
(
    [MessagePlatform] ASC,
    [MessageID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF,
 DROP_EXISTING = OFF, ONLINE = OFF, 
 ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO

CREATE NONCLUSTERED INDEX [MessageID and Platform and Category] 
ON [BoothComm].[UniversalMessageQueue]
(
    [MessagePlatform] ASC
    [MessageCategory] ASC,
    [MessageID] ASC,
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF,
 DROP_EXISTING = OFF, ONLINE = OFF, 
 ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
Run Code Online (Sandbox Code Playgroud)

它使用哪个索引将取决于 ifMessageCategory在您的WHERE语句中。


Linux 记录比 Windows 记录多很多(1,762,461 对 11,786)。索引现在的方式是,索引扫描从最大的开始,MessageID然后向下移动列表,直到找到匹配的MessagePlatform. 由于有更多的记录是 Linux,它会很快命中一个。由于 Windows 的数量少得多,因此它必须扫描得更远,花费的时间更长。


Eri*_*ing 8

只是为了提供一些额外的索引策略,这就是我想出的。

我不确定您表中的其余数据分布。我知道Accounting/Linux作为一个组合构成 1,762,461 行,并Accounting/Windows构成 11,786 行。假设还有其他部门和平台,我把这些数据贴在一张表中。

USE tempdb;

CREATE TABLE dbo.Whatever
(
    Id INT IDENTITY(1,1),
    MessageId INT NOT NULL,
    MessageCategory NVARCHAR(50),
    MessagePlatform NVARCHAR(50)
);

INSERT dbo.Whatever ( MessageId, MessageCategory, MessagePlatform )
SELECT TOP 1762461 x.n, 'Accounting', 'Linux'
FROM (SELECT ROW_NUMBER() OVER (ORDER BY @@ROWCOUNT) AS n
        FROM sys.messages AS m
        CROSS JOIN sys.messages AS m2) AS x

INSERT dbo.Whatever ( MessageId, MessageCategory, MessagePlatform )
SELECT TOP 11786 x.n, 'Accounting', 'Windows'
FROM (SELECT ROW_NUMBER() OVER (ORDER BY @@ROWCOUNT) AS n
        FROM sys.messages AS m
        CROSS JOIN sys.messages AS m2) AS x


INSERT dbo.Whatever ( MessageId, MessageCategory, MessagePlatform )
SELECT TOP 5000000 x.n, 'HR', 'Unix'
FROM (SELECT ROW_NUMBER() OVER (ORDER BY @@ROWCOUNT) AS n
        FROM sys.messages AS m
        CROSS JOIN sys.messages AS m2) AS x


INSERT dbo.Whatever ( MessageId, MessageCategory, MessagePlatform )
SELECT TOP 1000000 x.n, 'Accounting', 'Mac'
FROM (SELECT ROW_NUMBER() OVER (ORDER BY @@ROWCOUNT) AS n
        FROM sys.messages AS m
        CROSS JOIN sys.messages AS m2) AS x

INSERT dbo.Whatever ( MessageId, MessageCategory, MessagePlatform )
SELECT TOP 1000000 x.n, 'IT', 'Windows'
FROM (SELECT ROW_NUMBER() OVER (ORDER BY @@ROWCOUNT) AS n
        FROM sys.messages AS m
        CROSS JOIN sys.messages AS m2) AS x

INSERT dbo.Whatever ( MessageId, MessageCategory, MessagePlatform )
SELECT TOP 1000000 x.n, 'IT', 'Linux'
FROM (SELECT ROW_NUMBER() OVER (ORDER BY @@ROWCOUNT) AS n
        FROM sys.messages AS m
        CROSS JOIN sys.messages AS m2) AS x

INSERT dbo.Whatever ( MessageId, MessageCategory, MessagePlatform )
SELECT TOP 1000000 x.n, 'HR', 'Mac'
FROM (SELECT ROW_NUMBER() OVER (ORDER BY @@ROWCOUNT) AS n
        FROM sys.messages AS m
        CROSS JOIN sys.messages AS m2) AS x

INSERT dbo.Whatever ( MessageId, MessageCategory, MessagePlatform )
SELECT TOP 1000000 x.n, 'HR', 'Windows'
FROM (SELECT ROW_NUMBER() OVER (ORDER BY @@ROWCOUNT) AS n
        FROM sys.messages AS m
        CROSS JOIN sys.messages AS m2) AS x

ALTER TABLE dbo.Whatever ADD CONSTRAINT pk_thoughtful PRIMARY KEY CLUSTERED (Id)
Run Code Online (Sandbox Code Playgroud)

第一种策略: 由于您正在查询 a MAX(),因此对索引中的MessageId列进行排序可能更有意义DESC。这可以让您避免Sort在计划中出现不需要的情况。

CREATE INDEX ix_love_and_rockets
    ON dbo.Whatever
(
    MessagePlatform,
    MessageId DESC,
    MessageCategory );
Run Code Online (Sandbox Code Playgroud)

第二种策略: 如果您最关心MessagePlatformWindows 和 Linux,则对这些值进行过滤索引可能是有意义的。我将坚持使用 的降序排列MessageId

CREATE INDEX ix_bauhaus_was_better
    ON dbo.Whatever
(
    MessageCategory,
    MessageId DESC )
    INCLUDE ( MessagePlatform )
    WHERE MessagePlatform IN ( 'Windows', 'Linux' );
Run Code Online (Sandbox Code Playgroud)

结果: 当运行一些示例查询时,索引使用是混合的

SELECT MAX(w.MessageId) AS MaxId
FROM   dbo.Whatever AS w
WHERE  w.MessagePlatform = 'Linux'
       AND w.MessageCategory = 'Accounting'
       AND 1 = ( SELECT 1 );

SELECT MAX(w.MessageId) AS MaxId
FROM   dbo.Whatever AS w
WHERE  w.MessagePlatform = 'Windows'
       AND w.MessageCategory = 'Accounting'
       AND 1 = ( SELECT 1 );

SELECT MAX(w.MessageId) AS MaxId
FROM   dbo.Whatever AS w
WHERE  w.MessagePlatform = 'Windows'
       AND 1 = ( SELECT 1 );

SELECT MAX(w.MessageId) AS MaxId
FROM   dbo.Whatever AS w
WHERE  w.MessagePlatform = 'Linux'
       AND 1 = ( SELECT 1 );
Run Code Online (Sandbox Code Playgroud)

这是统计时间和 I/O 结果:

  • 查询 1

    SQL Server 执行时间:CPU 时间 = 0 毫秒,已用时间 = 0 毫秒。表“随便”。扫描计数 1,逻辑读取 3,物理读取 0,预读读取 0,lob 逻辑读取 0,lob 物理读取 0,lob 预读读取 0。

    SQL Server 执行时间:CPU 时间 = 0 毫秒,已用时间 = 0 毫秒。

  • 查询 2

    SQL Server 执行时间:CPU 时间 = 0 毫秒,已用时间 = 0 毫秒。表“随便”。扫描计数1,逻辑读10938,物理读0,预读0,lob逻辑读0,lob物理读0,lob预读0。

    SQL Server 执行时间:CPU 时间 = 625 毫秒,已用时间 = 614 毫秒。

  • 查询 3

    SQL Server 执行时间:CPU 时间 = 0 毫秒,已用时间 = 0 毫秒。表“随便”。扫描计数 1,逻辑读 5,物理读 0,预读 0,lob 逻辑读 0,lob 物理读 0,lob 预读 0。

    SQL Server 执行时间:CPU 时间 = 0 毫秒,已用时间 = 0 毫秒。

  • 查询 4

    SQL Server 执行时间:CPU 时间 = 0 毫秒,已用时间 = 0 毫秒。表“随便”。扫描计数 1,逻辑读取 4,物理读取 0,预读读取 0,lob 逻辑读取 0,lob 物理读取 0,lob 预读读取 0。

    SQL Server 执行时间:CPU 时间 = 0 毫秒,已用时间 = 0 毫秒。

我的博客上讲述两个过滤和降序索引一点点在这里,如果你想多一点信息。

为什么查询 2 需要更长的时间?答案是“字母表”。

如果您MessagePlatformDESC顺序创建索引,则顺序将翻转。

只考虑这两个索引:

CREATE INDEX ix_love_and_rockets
    ON dbo.Whatever
(
    MessagePlatform,
    MessageId DESC,
    MessageCategory );


CREATE INDEX ix_rockets_and_love
    ON dbo.Whatever
(
    MessagePlatform DESC,
    MessageId DESC,
    MessageCategory );
Run Code Online (Sandbox Code Playgroud)

我们正在改变 的顺序MessagePlatform。现在,如果我们运行相同的两个查询来暗示这些索引,性能差异将不会翻转。

SELECT MAX(w.MessageId) AS MaxId
FROM   dbo.Whatever AS w WITH (INDEX = ix_love_and_rockets)
WHERE  w.MessagePlatform = 'Linux'
       AND w.MessageCategory = 'Accounting'
       AND 1 = ( SELECT 1 );

SELECT MAX(w.MessageId) AS MaxId
FROM   dbo.Whatever AS w WITH (INDEX = ix_love_and_rockets)
WHERE  w.MessagePlatform = 'Windows'
       AND w.MessageCategory = 'Accounting'
       AND 1 = ( SELECT 1 );

SELECT MAX(w.MessageId) AS MaxId
FROM   dbo.Whatever AS w WITH (INDEX = ix_rockets_and_love)
WHERE  w.MessagePlatform = 'Linux'
       AND w.MessageCategory = 'Accounting'
       AND 1 = ( SELECT 1 );

SELECT MAX(w.MessageId) AS MaxId
FROM   dbo.Whatever AS w WITH (INDEX = ix_rockets_and_love)
WHERE  w.MessagePlatform = 'Windows'
       AND w.MessageCategory = 'Accounting'
       AND 1 = ( SELECT 1 );
Run Code Online (Sandbox Code Playgroud)
  • 查询 1

    表“随便”。扫描计数 1,逻辑读取 4,物理读取 0,预读读取 0,lob 逻辑读取 0,lob 物理读取 0,lob 预读读取 0。

    SQL Server 执行时间:CPU 时间 = 0 毫秒,已用时间 = 0 毫秒。

  • 查询 2

    表“随便”。扫描计数 1,逻辑读取 9338,物理读取 0,预读读取 0,lob 逻辑读取 0,lob 物理读取 0,lob 预读读取 0。

    SQL Server 执行时间:CPU 时间 = 687 毫秒,已用时间 = 684 毫秒。

  • 查询 3

    表“随便”。扫描计数 1,逻辑读 5,物理读 0,预读 0,lob 逻辑读 0,lob 物理读 0,lob 预读 0。

    SQL Server 执行时间:CPU 时间 = 0 毫秒,已用时间 = 0 毫秒。

  • 查询 4

    SQL Server 执行时间:CPU 时间 = 0 毫秒,已用时间 = 0 毫秒。表“随便”。扫描计数 1,逻辑读取 9337,物理读取 0,预读读取 0,lob 逻辑读取 0,lob 物理读取 0,lob 预读读取 0。

    SQL Server 执行时间:CPU 时间 = 672 毫秒,已用时间 = 678 毫秒。

根本原因:在两个查询中,我们都寻求MessagePlatform,但是我们在 上有一个残差谓词MessageCategory

区别在于数据分布。在我的测试数据中,要过滤掉的Windows行要MessageCategory多得多Linux——特别是Windows/HR组合的行要多 1 毫米。

希望这可以帮助!