获取索引扫描而不是可能的索引搜索？

Question

获取索引扫描而不是可能的索引搜索？

Che*_*ain 4 index sql-server optimization

目前正在学习有关查询优化的一些东西，我一直在尝试不同的查询并偶然发现了这个“问题”。

我正在使用 AdventureWorks2014 数据库，我运行了这个简单的查询：

表结构（取自https://www.sqldatadictionary.com/AdventureWorks2014.pdf）：

SELECT C.CustomerID
FROM Sales.Customer AS C
WHERE C.CustomerID > 100

Run Code Online (Sandbox Code Playgroud)

返回 19,720 行

Sales.Customer 中的总行数 = 19,820

在检查以确保 CustomerID 实际上不仅是表的 PK 还具有聚集索引（但它使用非聚集索引）之后，情况确实如此：

EXEC SP_HELPINDEX 'Sales.Customer'

Run Code Online (Sandbox Code Playgroud)

这是执行计划？

https://www.brentozar.com/pastetheplan/?id=B1g1SihGr

我读过当面对大量数据和/或当它返回超过 50% 的数据集时，查询优化器将支持索引扫描。但是该表作为一个整体几乎没有 20,000 行（准确地说是 19,820 行），无论如何它都不是一张大表。

当我运行此查询时：

SELECT C.CustomerID
FROM Sales.Customer AS C
WHERE C.CustomerID > 30000

Run Code Online (Sandbox Code Playgroud)

返回 118 行

https://www.brentozar.com/pastetheplan/?id=Byyux32MS

我得到了一个索引查找，所以我认为这是由于“超过 50% 的情况”但是，我也运行了这个查询：

SELECT C.CustomerID
FROM Sales.Customer AS C
WHERE C.CustomerID > 20000

Run Code Online (Sandbox Code Playgroud)

返回 10,118 行

https://www.brentozar.com/pastetheplan/?id=HJ9oV33zr

它还使用了索引查找，即使它返回了超过 50% 的数据集。

那么这里发生了什么？

编辑：

打开 IO Statistics 后，>100 查询返回：

Table 'Customer'. Scan count 1, logical reads 37, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Run Code Online (Sandbox Code Playgroud)

当超过 20,000 人返回时：

Table 'Customer'. Scan count 1, logical reads 65, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Run Code Online (Sandbox Code Playgroud)

所以WITH(FORCESCAN)在 >20,000 中添加了选项，看看会发生什么：

Table 'Customer'. Scan count 1, logical reads 37, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Run Code Online (Sandbox Code Playgroud)

因此，即使查询优化器选择为此特定查询运行索引查找，它最终也会通过索引扫描（较少的逻辑读取）运行得更好。

Answer 1

Den*_*kin 5

您使用非相等谓词，因此您的“查找”操作实际上是从某个值（而不是“第一个”）开始的扫描，然后转到聚集索引叶级别的末尾。

另一方面，您只返回一列作为聚集索引键，因此使用任何索引都不会获得任何键查找操作。优化器必须估计什么会更便宜：扫描非聚集索引（叶级别上的两个 int 列）或部分扫描聚集索引（叶级别上的所有列）。

它根据当前的统计信息（多少行）和元数据（一行大小是多少）来估计它。我们看到优化器在>20,000谓词上犯了一个错误。

当面对大量数据和/或当它返回超过 50% 的数据集时，查询优化器将倾向于索引扫描。

当优化器必须选择执行聚集索引或表扫描与非聚集索引查找 + 键查找时，这是一个事实。

在您的情况下，如果您的索引CustomerID是非聚集的，您将始终看到对该索引的搜索操作，但是如果您随后将另一列添加到您的输出中，您将看到短结果集上的索引搜索 + RID 查找和大结果集上的表扫描。

归档时间：	6 年，1 月前
查看次数：	2775 次
最近记录：	6 年，1 月前