Gar*_*rom 4 performance sql-server sql-server-2014 query-performance
数据库:SQL Server 12.0.5207
除了过滤条件之一的值外,我有几个查询在所有方面都完全相同。同一张表(不是另一台服务器上的模式副本),因此索引、资源等都是相同的。一切都是完全相同的。
此查询在一秒钟内运行:
SELECT
MAX(MessageID) AS [MaxID]
FROM BoothComm.UniversalMessageQueue
WHERE
MessagePlatform = 'linux'
Run Code Online (Sandbox Code Playgroud)
此查询在一秒钟内运行:
SELECT
MAX(MessageID) AS [MaxID]
FROM BoothComm.UniversalMessageQueue
WHERE
MessagePlatform = 'linux'
AND
MessageCategory = 'accounting'
Run Code Online (Sandbox Code Playgroud)
此查询在一秒钟内运行:
SELECT
MAX(MessageID) AS [MaxID]
FROM BoothComm.UniversalMessageQueue
WHERE
MessagePlatform = 'windows'
Run Code Online (Sandbox Code Playgroud)
那么为什么这个运行需要将近 30 秒呢?
SELECT
MAX(MessageID) AS [MaxID]
FROM BoothComm.UniversalMessageQueue
WHERE
MessagePlatform = 'windows'
AND
MessageCategory = 'accounting'
Run Code Online (Sandbox Code Playgroud)
我的一个同事在表中添加了另一个索引,解决了延迟的业务问题。该索引将 30 秒减少到 FULL 秒,同时将其他查询加速到瞬时。同样,执行计划完全相同:
(索引扫描应为 100%)。我听取了其他论坛的建议,并确保查询中的列顺序与它们存储在索引中的顺序相匹配...
CREATE NONCLUSTERED INDEX [MessageID and Platform and Category] ON [BoothComm].[UniversalMessageQueue]
(
[MessageID] ASC,
[MessagePlatform] ASC,
[MessageCategory] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
Run Code Online (Sandbox Code Playgroud)
如果有帮助,我还会提供表模式。
CREATE TABLE [BoothComm].[UniversalMessageQueue](
[MessageQueueId] [bigint] IDENTITY(1,1) NOT NULL,
[MessageID] [bigint] NOT NULL,
[MessagePlatform] [nvarchar](50) NULL,
[AssetNumber] [nvarchar](50) NOT NULL,
[MessageState] [int] NULL,
[MessageStateLabel] [nvarchar](50) NULL,
[MessageType] [int] NULL,
[MessageTypeLabel] [nvarchar](50) NULL,
[MessageCategory] [nvarchar](50) NULL,
[MessageSource] [int] NULL,
[MessageSourceLabel] [nvarchar](50) NULL,
[MessageSourceSerialNumber] [nvarchar](50) NULL,
[MessageCreateDate] [datetime] NULL,
[MessageTransmitDate] [datetime] NULL,
[MessageReceivedDate] [datetime] NULL,
[MessageStoredDate] [datetime] NULL,
[XMLPayload] [nvarchar](max) NULL,
[JSONPayload] [nvarchar](max) NULL,
[SemanticXML] [nvarchar](max) NULL,
[SemanticJSON] [nvarchar](max) NULL,
[MessageSequenceNumber] [int] NULL,
[ERPImportDate] [datetime] NULL,
[ERPImportStatus] [int] NULL,
[ERPMsg] [nvarchar](max) NULL,
[NormalizationDate] [datetime] NULL,
[NormalizationStatus] [int] NULL,
[NormalizationDesc] [nvarchar](max) NULL,
[SemanticDate] [datetime] NULL,
[SemanticStatus] [int] NULL,
[SemanticDesc] [nvarchar](max) NULL,
[CreatedDate] [datetime] NOT NULL DEFAULT (getdate()),
[CreatedBy] [nvarchar](50) NOT NULL DEFAULT ('ETL'),
[UpdatedDate] [datetime] NOT NULL DEFAULT (getdate()),
[UpdatedBy] [nvarchar](50) NOT NULL DEFAULT ('ETL'),
[ETL_ID] [uniqueidentifier] NULL,
PRIMARY KEY CLUSTERED
(
[MessageQueueId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY],
CONSTRAINT [CK_ETL_Unique_MessageID_Platform] UNIQUE NONCLUSTERED
(
[MessageID] ASC,
[MessagePlatform] ASC,
[MessageType] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
Run Code Online (Sandbox Code Playgroud)
这应该为您提供了足够的代码来重现问题......只需用大约 1100 万条记录填充表,您就可以看到问题!
由于它被提出了几次,我什至没想过要检查,我查看了有多少“windows”记录与“linux”记录。
SELECT COUNT(*)
FROM BoothComm.UniversalMessageQueue
WHERE
MessageCategory = 'Accounting'
AND
MessagePlatform = 'linux';
-- returned 1762461
SELECT COUNT(*)
FROM BoothComm.UniversalMessageQueue
WHERE
MessageCategory = 'Accounting'
AND
MessagePlatform = 'windows';
-- returned 11786
Run Code Online (Sandbox Code Playgroud)
所以......我猜记录计数不是问题?
现在您正在进行索引扫描,因为所需的所有列都在索引中,并且比扫描表更快。但是,它必须扫描,而不是搜索,因为索引上的第一列不在WHERE
语句中并且不限制返回。您首先需要重新排列索引列MessagePlatform
,因为它始终在您的WHERE
语句中。
根据您的数据大小和所需的插入速度,您可能需要考虑两个索引。如果您只想要一个索引,我会采用以下方法:
CREATE NONCLUSTERED INDEX [MessageID and Platform and Category]
ON [BoothComm].[UniversalMessageQueue]
(
[MessagePlatform] ASC,
[MessageID] ASC,
[MessageCategory] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF,
DROP_EXISTING = OFF, ONLINE = OFF,
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
Run Code Online (Sandbox Code Playgroud)
但如果你能负担得起两个,我会转向:
CREATE NONCLUSTERED INDEX [MessageID and Platform]
ON [BoothComm].[UniversalMessageQueue]
(
[MessagePlatform] ASC,
[MessageID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF,
DROP_EXISTING = OFF, ONLINE = OFF,
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [MessageID and Platform and Category]
ON [BoothComm].[UniversalMessageQueue]
(
[MessagePlatform] ASC
[MessageCategory] ASC,
[MessageID] ASC,
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF,
DROP_EXISTING = OFF, ONLINE = OFF,
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
Run Code Online (Sandbox Code Playgroud)
它使用哪个索引将取决于 ifMessageCategory
在您的WHERE
语句中。
Linux 记录比 Windows 记录多很多(1,762,461 对 11,786)。索引现在的方式是,索引扫描从最大的开始,MessageID
然后向下移动列表,直到找到匹配的MessagePlatform
. 由于有更多的记录是 Linux,它会很快命中一个。由于 Windows 的数量少得多,因此它必须扫描得更远,花费的时间更长。
只是为了提供一些额外的索引策略,这就是我想出的。
我不确定您表中的其余数据分布。我知道Accounting/Linux
作为一个组合构成 1,762,461 行,并Accounting/Windows
构成 11,786 行。假设还有其他部门和平台,我把这些数据贴在一张表中。
USE tempdb;
CREATE TABLE dbo.Whatever
(
Id INT IDENTITY(1,1),
MessageId INT NOT NULL,
MessageCategory NVARCHAR(50),
MessagePlatform NVARCHAR(50)
);
INSERT dbo.Whatever ( MessageId, MessageCategory, MessagePlatform )
SELECT TOP 1762461 x.n, 'Accounting', 'Linux'
FROM (SELECT ROW_NUMBER() OVER (ORDER BY @@ROWCOUNT) AS n
FROM sys.messages AS m
CROSS JOIN sys.messages AS m2) AS x
INSERT dbo.Whatever ( MessageId, MessageCategory, MessagePlatform )
SELECT TOP 11786 x.n, 'Accounting', 'Windows'
FROM (SELECT ROW_NUMBER() OVER (ORDER BY @@ROWCOUNT) AS n
FROM sys.messages AS m
CROSS JOIN sys.messages AS m2) AS x
INSERT dbo.Whatever ( MessageId, MessageCategory, MessagePlatform )
SELECT TOP 5000000 x.n, 'HR', 'Unix'
FROM (SELECT ROW_NUMBER() OVER (ORDER BY @@ROWCOUNT) AS n
FROM sys.messages AS m
CROSS JOIN sys.messages AS m2) AS x
INSERT dbo.Whatever ( MessageId, MessageCategory, MessagePlatform )
SELECT TOP 1000000 x.n, 'Accounting', 'Mac'
FROM (SELECT ROW_NUMBER() OVER (ORDER BY @@ROWCOUNT) AS n
FROM sys.messages AS m
CROSS JOIN sys.messages AS m2) AS x
INSERT dbo.Whatever ( MessageId, MessageCategory, MessagePlatform )
SELECT TOP 1000000 x.n, 'IT', 'Windows'
FROM (SELECT ROW_NUMBER() OVER (ORDER BY @@ROWCOUNT) AS n
FROM sys.messages AS m
CROSS JOIN sys.messages AS m2) AS x
INSERT dbo.Whatever ( MessageId, MessageCategory, MessagePlatform )
SELECT TOP 1000000 x.n, 'IT', 'Linux'
FROM (SELECT ROW_NUMBER() OVER (ORDER BY @@ROWCOUNT) AS n
FROM sys.messages AS m
CROSS JOIN sys.messages AS m2) AS x
INSERT dbo.Whatever ( MessageId, MessageCategory, MessagePlatform )
SELECT TOP 1000000 x.n, 'HR', 'Mac'
FROM (SELECT ROW_NUMBER() OVER (ORDER BY @@ROWCOUNT) AS n
FROM sys.messages AS m
CROSS JOIN sys.messages AS m2) AS x
INSERT dbo.Whatever ( MessageId, MessageCategory, MessagePlatform )
SELECT TOP 1000000 x.n, 'HR', 'Windows'
FROM (SELECT ROW_NUMBER() OVER (ORDER BY @@ROWCOUNT) AS n
FROM sys.messages AS m
CROSS JOIN sys.messages AS m2) AS x
ALTER TABLE dbo.Whatever ADD CONSTRAINT pk_thoughtful PRIMARY KEY CLUSTERED (Id)
Run Code Online (Sandbox Code Playgroud)
第一种策略:
由于您正在查询 a MAX()
,因此对索引中的MessageId
列进行排序可能更有意义DESC
。这可以让您避免Sort
在计划中出现不需要的情况。
CREATE INDEX ix_love_and_rockets
ON dbo.Whatever
(
MessagePlatform,
MessageId DESC,
MessageCategory );
Run Code Online (Sandbox Code Playgroud)
第二种策略:
如果您最关心MessagePlatform
Windows 和 Linux,则对这些值进行过滤索引可能是有意义的。我将坚持使用 的降序排列MessageId
。
CREATE INDEX ix_bauhaus_was_better
ON dbo.Whatever
(
MessageCategory,
MessageId DESC )
INCLUDE ( MessagePlatform )
WHERE MessagePlatform IN ( 'Windows', 'Linux' );
Run Code Online (Sandbox Code Playgroud)
结果: 当运行一些示例查询时,索引使用是混合的。
SELECT MAX(w.MessageId) AS MaxId
FROM dbo.Whatever AS w
WHERE w.MessagePlatform = 'Linux'
AND w.MessageCategory = 'Accounting'
AND 1 = ( SELECT 1 );
SELECT MAX(w.MessageId) AS MaxId
FROM dbo.Whatever AS w
WHERE w.MessagePlatform = 'Windows'
AND w.MessageCategory = 'Accounting'
AND 1 = ( SELECT 1 );
SELECT MAX(w.MessageId) AS MaxId
FROM dbo.Whatever AS w
WHERE w.MessagePlatform = 'Windows'
AND 1 = ( SELECT 1 );
SELECT MAX(w.MessageId) AS MaxId
FROM dbo.Whatever AS w
WHERE w.MessagePlatform = 'Linux'
AND 1 = ( SELECT 1 );
Run Code Online (Sandbox Code Playgroud)
这是统计时间和 I/O 结果:
查询 1
SQL Server 执行时间:CPU 时间 = 0 毫秒,已用时间 = 0 毫秒。表“随便”。扫描计数 1,逻辑读取 3,物理读取 0,预读读取 0,lob 逻辑读取 0,lob 物理读取 0,lob 预读读取 0。
SQL Server 执行时间:CPU 时间 = 0 毫秒,已用时间 = 0 毫秒。
查询 2
SQL Server 执行时间:CPU 时间 = 0 毫秒,已用时间 = 0 毫秒。表“随便”。扫描计数1,逻辑读10938,物理读0,预读0,lob逻辑读0,lob物理读0,lob预读0。
SQL Server 执行时间:CPU 时间 = 625 毫秒,已用时间 = 614 毫秒。
查询 3
SQL Server 执行时间:CPU 时间 = 0 毫秒,已用时间 = 0 毫秒。表“随便”。扫描计数 1,逻辑读 5,物理读 0,预读 0,lob 逻辑读 0,lob 物理读 0,lob 预读 0。
SQL Server 执行时间:CPU 时间 = 0 毫秒,已用时间 = 0 毫秒。
查询 4
SQL Server 执行时间:CPU 时间 = 0 毫秒,已用时间 = 0 毫秒。表“随便”。扫描计数 1,逻辑读取 4,物理读取 0,预读读取 0,lob 逻辑读取 0,lob 物理读取 0,lob 预读读取 0。
SQL Server 执行时间:CPU 时间 = 0 毫秒,已用时间 = 0 毫秒。
我的博客上讲述两个过滤和降序索引一点点在这里,如果你想多一点信息。
为什么查询 2 需要更长的时间?答案是“字母表”。
如果您MessagePlatform
按DESC
顺序创建索引,则顺序将翻转。
只考虑这两个索引:
CREATE INDEX ix_love_and_rockets
ON dbo.Whatever
(
MessagePlatform,
MessageId DESC,
MessageCategory );
CREATE INDEX ix_rockets_and_love
ON dbo.Whatever
(
MessagePlatform DESC,
MessageId DESC,
MessageCategory );
Run Code Online (Sandbox Code Playgroud)
我们正在改变 的顺序MessagePlatform
。现在,如果我们运行相同的两个查询来暗示这些索引,性能差异将不会翻转。
SELECT MAX(w.MessageId) AS MaxId
FROM dbo.Whatever AS w WITH (INDEX = ix_love_and_rockets)
WHERE w.MessagePlatform = 'Linux'
AND w.MessageCategory = 'Accounting'
AND 1 = ( SELECT 1 );
SELECT MAX(w.MessageId) AS MaxId
FROM dbo.Whatever AS w WITH (INDEX = ix_love_and_rockets)
WHERE w.MessagePlatform = 'Windows'
AND w.MessageCategory = 'Accounting'
AND 1 = ( SELECT 1 );
SELECT MAX(w.MessageId) AS MaxId
FROM dbo.Whatever AS w WITH (INDEX = ix_rockets_and_love)
WHERE w.MessagePlatform = 'Linux'
AND w.MessageCategory = 'Accounting'
AND 1 = ( SELECT 1 );
SELECT MAX(w.MessageId) AS MaxId
FROM dbo.Whatever AS w WITH (INDEX = ix_rockets_and_love)
WHERE w.MessagePlatform = 'Windows'
AND w.MessageCategory = 'Accounting'
AND 1 = ( SELECT 1 );
Run Code Online (Sandbox Code Playgroud)
查询 1
表“随便”。扫描计数 1,逻辑读取 4,物理读取 0,预读读取 0,lob 逻辑读取 0,lob 物理读取 0,lob 预读读取 0。
SQL Server 执行时间:CPU 时间 = 0 毫秒,已用时间 = 0 毫秒。
查询 2
表“随便”。扫描计数 1,逻辑读取 9338,物理读取 0,预读读取 0,lob 逻辑读取 0,lob 物理读取 0,lob 预读读取 0。
SQL Server 执行时间:CPU 时间 = 687 毫秒,已用时间 = 684 毫秒。
查询 3
表“随便”。扫描计数 1,逻辑读 5,物理读 0,预读 0,lob 逻辑读 0,lob 物理读 0,lob 预读 0。
SQL Server 执行时间:CPU 时间 = 0 毫秒,已用时间 = 0 毫秒。
查询 4
SQL Server 执行时间:CPU 时间 = 0 毫秒,已用时间 = 0 毫秒。表“随便”。扫描计数 1,逻辑读取 9337,物理读取 0,预读读取 0,lob 逻辑读取 0,lob 物理读取 0,lob 预读读取 0。
SQL Server 执行时间:CPU 时间 = 672 毫秒,已用时间 = 678 毫秒。
根本原因:在两个查询中,我们都寻求MessagePlatform
,但是我们在 上有一个残差谓词MessageCategory
。
区别在于数据分布。在我的测试数据中,要过滤掉的Windows
行要MessageCategory
多得多Linux
——特别是Windows/HR
组合的行要多 1 毫米。
希望这可以帮助!
归档时间: |
|
查看次数: |
3589 次 |
最近记录: |