为什么这个索引查找比索引扫描导致更多的读取

use*_*117 5 index sql-server index-tuning

正如我所读到的,大多数时候索引搜索比索引扫描更受欢迎,我正在尝试一些东西。

我有一个查询在使用索引扫描时执行 993 次读取(使用 SQL Profiler 检查)。使用索引查找时,需要 44.347 次读取。感觉有什么不对,或者我不明白。

这是索引扫描的查询:

select      t5.Id as t5Id
from        table1 t1
left join   table2 t2 on t2.Table1Id = t1.Id
left join   table3 t3 on t3.Table2Id = t2.Id
left join   table4 t4 on t4.Table3Id = t3.Id
left join   table5 t5 on t5.Table4Id = t4.Id
Run Code Online (Sandbox Code Playgroud)

这是索引查找的查询:

select      t5.Id as t5Id
from        table1 t1
left join   table2 t2 on t2.Table1Id = t1.Id
left join   table3 t3 on t3.Table2Id = t2.Id
left join   table4 t4 on t4.Table3Id = t3.Id
left join   table5 t5 WITH (FORCESEEK) on t5.Table4Id = t4.Id
Run Code Online (Sandbox Code Playgroud)

表格简单明了。最后我用一些虚拟数据填充它们,所以它可以很容易地复制。

CREATE TABLE [dbo].[table1](
    [Id] [bigint] IDENTITY(1,1) NOT NULL,
    [Name] [nvarchar](max) NOT NULL,
CONSTRAINT [PK_table1] PRIMARY KEY CLUSTERED 
(
    [Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO




CREATE TABLE [dbo].[table2](
    [Id] [bigint] IDENTITY(1,1) NOT NULL,
    [Table1Id] [bigint] NOT NULL,
CONSTRAINT [PK_table2] PRIMARY KEY CLUSTERED 
(
    [Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

ALTER TABLE     [dbo].[table2] WITH CHECK
ADD CONSTRAINT  [FK_table2_table1Id] FOREIGN KEY([table1Id])
REFERENCES      [dbo].[table1] ([Id])
GO

ALTER TABLE         [dbo].[table2]
CHECK CONSTRAINT    [FK_table2_table1Id]
GO


CREATE NONCLUSTERED INDEX [IdxTable2_FKTable1Id] ON [dbo].[table2]
(
    [Table1Id] ASC
)
INCLUDE (   [Id]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO






CREATE TABLE [dbo].[table3](
    [Id] [bigint] IDENTITY(1,1) NOT NULL,
    [Table2Id] [bigint] NOT NULL,
CONSTRAINT [PK_table3] PRIMARY KEY CLUSTERED 
(
    [Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

ALTER TABLE     [dbo].[table3] WITH CHECK
ADD CONSTRAINT  [FK_table3_table2Id] FOREIGN KEY([table2Id])
REFERENCES      [dbo].[table2] ([Id])
GO

ALTER TABLE         [dbo].[table3]
CHECK CONSTRAINT    [FK_table3_table2Id]
GO


CREATE NONCLUSTERED INDEX [IdxTable3_FKTable2Id] ON [dbo].[table3]
(
    [Table2Id] ASC
)
INCLUDE (   [Id]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO







CREATE TABLE [dbo].[table4](
    [Id] [bigint] IDENTITY(1,1) NOT NULL,
    [Table3Id] [bigint] NOT NULL,
CONSTRAINT [PK_table4] PRIMARY KEY CLUSTERED 
(
    [Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

ALTER TABLE     [dbo].[table4] WITH CHECK
ADD CONSTRAINT  [FK_table4_table3Id] FOREIGN KEY([table3Id])
REFERENCES      [dbo].[table4] ([Id])
GO

ALTER TABLE         [dbo].[table4]
CHECK CONSTRAINT    [FK_table4_table3Id]
GO

CREATE NONCLUSTERED INDEX [IdxTable4_FKTable3Id] ON [dbo].[table4]
(
    [Table3Id] ASC
)
INCLUDE (   [Id]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO





CREATE TABLE [dbo].[table5](
    [Id] [bigint] IDENTITY(1,1) NOT NULL,
    [Table4Id] [bigint] NOT NULL,
    [Description] [nvarchar](2000) NOT NULL,
CONSTRAINT [PK_table5] PRIMARY KEY CLUSTERED 
(
    [Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

ALTER TABLE     [dbo].[table5] WITH CHECK
ADD CONSTRAINT  [FK_table5_table4Id] FOREIGN KEY([table4Id])
REFERENCES      [dbo].[table5] ([Id])
GO

ALTER TABLE         [dbo].[table5]
CHECK CONSTRAINT    [FK_table5_table4Id]
GO

CREATE NONCLUSTERED INDEX [IdxTable5_FKTable4Id] ON [dbo].[table5]
(
    [Table4Id] ASC
)
INCLUDE (   [Id], [Description]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO


set nocount on

DECLARE @i INT = 0;
DECLARE @j INT = 0;
DECLARE @k INT = 0;
DECLARE @l INT = 10;
DECLARE @m INT = 0;

declare @table1Id bigint
declare @table2Id bigint
declare @table3Id bigint
declare @table4Id bigint


begin tran
WHILE @i < 10
BEGIN
    INSERT INTO [dbo].[table1] ([Name]) VALUES (cast(@i as nvarchar(10)))
    SELECT @table1Id = SCOPE_IDENTITY()

    WHILE @j < 10
    BEGIN
        INSERT INTO [dbo].[table2] ([Table1Id]) VALUES (@table1Id)
        SELECT @table2Id = SCOPE_IDENTITY()

        WHILE @k < 10
        BEGIN
            INSERT INTO [dbo].[table3] ([Table2Id]) VALUES (@table2Id)
            SELECT @table3Id = SCOPE_IDENTITY()

            WHILE @l > 0
            BEGIN
                INSERT INTO [dbo].[table4] ([Table3Id]) VALUES (@table3Id)
                SELECT @table4Id = SCOPE_IDENTITY()

                WHILE @m < 10
                BEGIN
                    INSERT INTO [dbo].[table5] ([Table4Id], [Description]) VALUES (@table4Id, 'Not so long description')

                    SET @m = @m + 1;
                END;

                SET @m = 0;
                SET @l = @l - 1;
            END;

            SET @l = 10;
            SET @k = @k + 1;
        END;

        SET @k = 0;
        SET @j = @j + 1;
    END;

    SET @j = 0;
    SET @i = @i + 1;
END;
commit
Run Code Online (Sandbox Code Playgroud)

Joe*_*ish 7

首先让我们估算一下对 IdxTable5_FKTable4Id 进行索引扫描所需的读取次数:

-- get an idea of number of pages in the index
SELECT *
FROM sys.dm_db_partition_stats s
WHERE OBJECT_NAME(s.object_id) IN ('table5')
AND index_id > 1;
Run Code Online (Sandbox Code Playgroud)

在我的系统上,该查询的结果表明 SQL Server 需要大约 900 次读取才能完整读取索引。为了对此进行测试,我将运行一个简单的查询,该查询在 SQL Server 中作为索引扫描最有效地实现。下面的查询需要 table5 中所有行的 ID 列。SQL Server 只需查看索引即可获取查询所需的所有数据。由于需要每一行,因此这里没有任何浪费的工作。

-- get logical reads after query execution
SET STATISTICS IO ON;

select t5.Id
from table5 t5;
Run Code Online (Sandbox Code Playgroud)

表'table5'。扫描计数 1,逻辑读取 900,物理读取 0,预读读取 0,lob 逻辑读取 0,lob 物理读取 0,lob 预读读取 0。

现在让我们考虑您的第一个不使用提示的测试查询。最终哈希匹配的外部表在我的系统上估计有 9657 行。查询优化器决定对 IdxTable5_FKTable4Id 进行索引扫描是一个足够好的计划。这是我运行的查询:

select      t5.Id as t5Id
from        table1 t1
left join   table2 t2 on t2.Table1Id = t1.Id
left join   table3 t3 on t3.Table2Id = t2.Id
left join   table4 t4 on t4.Table3Id = t3.Id
left join   table5 t5 on t5.Table4Id = t4.Id;
Run Code Online (Sandbox Code Playgroud)

表'table5'。扫描计数 1,逻辑读取 900,物理读取 0,预读读取 0,lob 逻辑读取 0,lob 物理读取 0,lob 预读读取 0。

散列连接的外部表实际上有 10000 行。然而,因为这是一个散列连接,SQL Server 仍然需要扫描索引中的所有 100000 行,即使只需要 10000 行。这就是为什么这个查询需要与第一个相同的 900 次逻辑读取。可以说这是查询优化器浪费的努力。使用索引查找仅从索引中获取所需的 10000 行会更有效吗?

首先让我们估计所需的读取次数。您的索引深度为 3:

-- get the index depth of the index
SELECT INDEXPROPERTY ( object_ID('table5'), 'IdxTable5_FKTable4Id' , 'IndexDepth');
Run Code Online (Sandbox Code Playgroud)

此外,为了这个演示,我将启用 TF 8744 以获得更清晰的结果:

-- Trace flag 8744: Disable pre-fetching for ranges
dbcc traceon(8744);
Run Code Online (Sandbox Code Playgroud)

我知道外部表有 10000 行,因此逻辑读取数的一个估计值是 10000 * (3 + 1) = 40000。每行,索引深度为 3,获取数据为 1。

这是经过测试的查询:

select      t5.Id as t5Id
from        table1 t1
left join   table2 t2 on t2.Table1Id = t1.Id
left join   table3 t3 on t3.Table2Id = t2.Id
left join   table4 t4 on t4.Table3Id = t3.Id
left join   table5 t5 WITH (FORCESEEK) on t5.Table4Id = t4.Id;
Run Code Online (Sandbox Code Playgroud)

表'table5'。扫描计数10000,逻辑读41118,物理读0,预读0,lob逻辑读0,lob物理读0,lob预读0。

这与估计的 40000 非常接近。

我们在这里学到了什么?对于此索引,每次索引查找的成本约为 4 次,执行扫描的固定成本为 900 次。这意味着从 IO 的角度来看,使用索引查找只会在从索引中获取一小部分数据时效率更高。否则使用索引扫描获取所有数据,即使不需要为查询返回正确的结果,也会更有效率。

对于最终测试,让我们尝试从原始测试查询中取回前 1000 行。在我的系统上,即使没有提示,查询优化器也会自然地选择索引查找。这是我运行的查询:

select TOP (1000)    t5.Id as t5Id
from        table1 t1
left join   table2 t2 on t2.Table1Id = t1.Id
left join   table3 t3 on t3.Table2Id = t2.Id
left join   table4 t4 on t4.Table3Id = t3.Id
left join   table5 t5 on t5.Table4Id = t4.Id;
Run Code Online (Sandbox Code Playgroud)

表'table5'。扫描计数 100,逻辑读取 409,物理读取 0,预读读取 0,lob 逻辑读取 0,lob 物理读取 0,lob 预读读取 0。

为此,可以将查询索引查找视为比索引扫描更有效。请注意,外部表有 100 行,因此进行了 100 次查找。一次搜索可以返回多于 1 行。这就是为什么逻辑读取接近 400 而不是 4000。