可以做些什么来进一步提高多连接和聚合查询的性能?

Spe*_*ine 3 performance index sql-server query-performance performance-tuning

我在这里模拟了一个典型的星型模式,我提到了两个查询:第一个查询只是将事实表与 2 个维度表和 1 个日历表连接起来,第二个查询连接和聚合。

我通过研究执行计划和一些通过阅读建议的索引进行了实验并创建了索引,并且所有这些都在一定程度上提高了性能。

我的问题是在这种情况下可以进一步做什么,可以应用哪些索引或如何修改查询以获得更好的性能并减少执行时间?

因此,首先要创建和填充表并创建索引的查询:

CREATE TABLE FactTable (id BIGINT IDENTITY PRIMARY KEY, FKDim1 BIGINT NOT NULL, FKDim2 BIGINT, DateRef DATETIME, Fact1 MONEY, Fact2 MONEY)
CREATE TABLE Dim1Table (id BIGINT IDENTITY PRIMARY KEY, Dim1Name NVARCHAR(20), Dim1Val1 MONEY, Dim1Val2 MONEY)
CREATE TABLE Dim2Table (id BIGINT IDENTITY PRIMARY KEY, Dim2Name NVARCHAR(20), Dim2Val1 MONEY, Dim2Val2 MONEY)
CREATE TABLE CalendarTable (id BIGINT IDENTITY PRIMARY KEY, [Date] DATETIME UNIQUE NONCLUSTERED, [Weekday] NVARCHAR(10), [Month] NVARCHAR(10))

ALTER TABLE FactTable ADD CONSTRAINT FK_Dim1 FOREIGN KEY (FKDim1 ) REFERENCES Dim1Table(ID);
ALTER TABLE FactTable ADD CONSTRAINT FK_Dim2 FOREIGN KEY (FKDim2 ) REFERENCES Dim1Table(ID);
ALTER TABLE FactTable ADD CONSTRAINT FK_Calendar FOREIGN KEY (DateRef) REFERENCES CalendarTable([Date]);

DECLARE @counter INT;
SET @counter = 1;

WHILE @counter < 10000
BEGIN
INSERT INTO Dim1Table(Dim1Name,Dim1Val1,Dim1Val2)VALUES('Dim1-'+CAST((@counter % 100) AS NVARCHAR),RAND() * 10000,RAND() * 20000);
INSERT INTO Dim2Table(Dim2Name,Dim2Val1,Dim2Val2)VALUES('Dim2-'+CAST(@counter AS NVARCHAR),RAND() * 10000,RAND() * 20000);
SET @counter = @counter + 1;
END

DECLARE @StartDate DATETIME
DECLARE @EndDate DATETIME
SET @StartDate = CAST('1/1/1995' AS DATETIME)
SET @EndDate = DATEADD(d, 3650, @StartDate)

WHILE @StartDate <= @EndDate
BEGIN
INSERT INTO CalendarTable([Date],[Weekday],[Month])SELECT @StartDate, DATENAME(dw, @StartDate), DATENAME(MONTH, @StartDate)
SET @StartDate = DATEADD(dd, 1, @StartDate)
END

SET @counter = 1;
WHILE @counter < 500000
BEGIN
INSERT INTO FactTable
(FKDim1,FKDim2,DateRef,Fact1,Fact2)VALUES(@counter % 10000,@counter % 10000, DATEADD(dd, @counter % 3650, CAST('1/1/1995' AS DATETIME)), RAND() * 10000, RAND() * 20000)
SET @counter = @counter + 1
END
Run Code Online (Sandbox Code Playgroud)

创建索引的代码:

CREATE NONCLUSTERED INDEX [Dim1TableIndex1] ON [dbo].[Dim1Table]([Dim1Name] ASC)INCLUDE([id], [Dim1Val1], [Dim1Val2]);
CREATE NONCLUSTERED INDEX [Dim1TableIndex2] ON [dbo].[Dim2Table]([Dim2Name] ASC)INCLUDE([id], [Dim2Val1], [Dim2Val2]);
CREATE NONCLUSTERED INDEX [FactTableIndex1] ON [dbo].FactTable(FKDim1 ASC)INCLUDE(FKDim2, DateRef, Fact1, Fact2);
CREATE NONCLUSTERED INDEX [FactTableIndex2] ON [dbo].FactTable(FKDim2 ASC)INCLUDE(FKDim1, DateRef, Fact1, Fact2);
CREATE UNIQUE NONCLUSTERED INDEX [CalnedarIndex1] ON [dbo].[CalendarTable]([Date] ASC)INCLUDE ([id],[Weekday],[Month]);
Run Code Online (Sandbox Code Playgroud)

查询 1:事实表与日历和维度表的简单连接,以及一个 where 子句:

SELECT D1.Dim1Name,
       D2.Dim2Name,
       C.[Date],
       C.[Weekday],
       C.[Month],
       D1.Dim1Val1,
       D2.Dim2Val2,
       F.Fact1,
       F.Fact2
FROM   FactTable F
       JOIN Dim1Table D1
            ON  D1.id = F.FKDim1
       JOIN Dim2Table D2
            ON  D2.id = F.FKDim2
       JOIN CalendarTable C
            ON  F.DateRef = C.Date
Run Code Online (Sandbox Code Playgroud)

关闭索引的执行详细信息(上面提到的所有 5 个)

    (15000 row(s) affected)
Table 'CalendarTable'. Scan count 9, logical reads 82, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Dim2Table'. Scan count 9, logical reads 205, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Dim1Table'. Scan count 9, logical reads 190, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'FactTable'. Scan count 9, logical reads 3890, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

 SQL Server Execution Times:
   CPU time = 159 ms,  elapsed time = 475 ms.
Run Code Online (Sandbox Code Playgroud)

和执行计划: 在此处输入图片说明

启用索引:

(15000 row(s) affected)
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'FactTable'. Scan count 300, logical reads 1083, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Dim1Table'. Scan count 3, logical reads 11, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'CalendarTable'. Scan count 1, logical reads 27, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Dim2Table'. Scan count 1, logical reads 67, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

 SQL Server Execution Times:
   CPU time = 125 ms,  elapsed time = 389 ms.
Run Code Online (Sandbox Code Playgroud)

和执行计划: 在此处输入图片说明

第二个查询,在加入后聚合:

SELECT D1.Dim1Name,
       C.[Month],
       Sum(D1.Dim1Val1) SumDim1Val1,
       Sum(D2.Dim2Val2) SumDim2Val2,
       Sum(F.Fact1) SumFact1,
       Avg(F.Fact2) Fact2Avg
FROM   FactTable F
       JOIN Dim1Table D1
            ON  D1.id = F.FKDim1
       JOIN Dim2Table D2
            ON  D2.id = F.FKDim2
       JOIN CalendarTable C
            ON  F.DateRef = C.Date
GROUP BY D1.Dim1Name, C.[MONTH]
Run Code Online (Sandbox Code Playgroud)

关闭所有索引的性能:

(1200 row(s) affected)
Table 'Dim1Table'. Scan count 9, logical reads 190, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'CalendarTable'. Scan count 9, logical reads 82, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Dim2Table'. Scan count 9, logical reads 205, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'FactTable'. Scan count 9, logical reads 3890, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

 SQL Server Execution Times:
   CPU time = 2436 ms,  elapsed time = 554 ms.
Run Code Online (Sandbox Code Playgroud)

和执行计划: 在此处输入图片说明

并启用索引:

(1200 row(s) affected)
Table 'Dim1Table'. Scan count 9, logical reads 181, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'CalendarTable'. Scan count 9, logical reads 76, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Dim2Table'. Scan count 9, logical reads 196, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'FactTable'. Scan count 9, logical reads 3710, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

 SQL Server Execution Times:
   CPU time = 2060 ms,  elapsed time = 518 ms.
Run Code Online (Sandbox Code Playgroud)

最后是执行计划: 在此处输入图片说明

我得到的改进不是很显着,但是当我考虑大量行时,例如从查询 1 中删除 where 子句,然后索引将执行时间从大约 9.5 秒减少到 8.3 秒。

我将在这里重申我的问题:

  1. 如何重新设计索引或添加新索引以提高性能?
  2. 如何通过重新设计查询来提高性能?
  3. 除了索引和重新设计查询之外还能做什么?

我已经展示了一些简单的例子,但试图涵盖一些典型的场景和星型模式中的查询类型,这些特定问题的答案背后的概念也将普遍适用。并使用 SQL Server 2012。

Mar*_*ith 8

尝试使用包含列的非聚集索引对星型模式查询进行微优化几乎没有任何需要、要点或好处。事实表是为扫描而构建的。

您在示例中创建的索引是父表的子集副本,正在扫描(无搜索)。较小的性能改进来自比父表扫描的页数略少。鉴于星型模式是为支持即席查询模式而构建的,因此创建索引来支持所有可能的查询是不可行的。

  • 在日期键上创建事实表聚集索引。大多数(典型)事实表查询都包含时间元素,并且日期键上的聚类支持对事实表行进行范围扫描。
  • 在事实表的外键上添加非聚集索引以帮助进行高度选择性的查询。可以使用 NOCHECK 创建维度表的外键,以防止对 ETL 产生任何影响。
  • 将您的维度表按其代理键进行聚类。
  • 在每个维度表的自然键上创建一个非聚集索引。
  • 停止。

优化器检测星型架构查询模式,并具有战略来有效地对付他们,利用扫描和散列标准版加入或位图在企业筛选。遵循上面概述的索引策略,让优化器处理其余的。


Jon*_*gel 5

除了 Mark 的出色回答之外,您还可以将其他一些策略添加到现有系统中(当然,这不是一个详尽的列表):

  1. 预聚合表或索引视图。这将物理实现查询的结果(或中间结果),因此 SQL Server 最终将扫描更小的索引以返回完整的结果集。这将使用您熟悉的技术将您的项目保持在同一个数据库中。

  2. 分析服务。如果计划支持大量数据切片和切块,则可能值得关注这一点。Analysis Services 旨在根据您输入的参数自动预聚合数据。这样做的缺点是它对您来说可能是全新的技术。虽然我不是专家在这方面,我会说那里一个学习的过程。这是一个非常强大的工具。

  3. 结果缓存。如果返回的行不是很多,并且您发现用户一遍又一遍地运行相同的查询,请缓存结果,并在加载新数据时使缓存无效(或找出一种基于新数据)。

根据您的项目的确切要求,这些可能不适用,但如果可以(单独或组合)实施,它们确实会提供性能/响应时间优势。