Jus*_*sel 8 sql sql-server query-optimization sql-server-2008
当我试图解决一个更复杂的问题时,我遇到了这个小问题,并试图找出优化器.所以,假设我有一个名为`MyTable'的表,可以像这样定义:
CREATE TABLE MyTable (
GroupClosuresID int identity(1,1) not null,
SiteID int not null,
DeleteDateTime datetime null
, CONSTRAINT PK_MyTable PRIMARY KEY (GroupClosuresID, SiteID))
Run Code Online (Sandbox Code Playgroud)
该表中有286,685行,运行DBCC SHOW_STATISTICS('MyTable','PK_MyTable')
将产生:
Name Updated Rows Rows Sampled Steps Density Average key length String Index Filter Expression Unfiltered Rows

PK_MyTable Aug 10 2011 1:00PM 286685 286685 18 0.931986 8 NO NULL 286685
(1 row(s) affected)
All density Average Length Columns
------------- -------------- ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
3.743145E-06 4 GroupClosuresID
3.488149E-06 8 GroupClosuresID, SiteID
(2 row(s) affected)
RANGE_HI_KEY RANGE_ROWS EQ_ROWS DISTINCT_RANGE_ROWS AVG_RANGE_ROWS
------------ ------------- ------------- -------------------- --------------
1 0 8 0 1
129 1002 7 127 7.889764
242 826 6 112 7.375
531 2010 6 288 6.979167
717 1108 5 185 5.989189
889 822 4 171 4.807017
1401 2044 4 511 4
1763 1101 3 361 3.049861
14207 24780 1 12443 1.991481
81759 67071 1 67071 1
114457 31743 1 31743 1
117209 2047 1 2047 1
179109 61439 1 61439 1
181169 1535 1 1535 1
229410 47615 1 47615 1
235846 2047 1 2047 1
275456 39442 1 39442 1
275457 0 1 0 1
Run Code Online (Sandbox Code Playgroud)
现在,我在此表上运行查询,但没有创建其他索引或统计信息.
SELECT GroupClosuresID FROM MyTable WHERE SiteID = 1397 AND DeleteDateTime IS NULL
Run Code Online (Sandbox Code Playgroud)
现在出现两个新的统计对象,一个用于SiteID
列,另一个用于DeleteDateTime
列.它们分别在这里(注意:一些不相关的信息已被排除在外):
Name Updated Rows Rows Sampled Steps Density Average key length String Index Filter Expression Unfiltered Rows

_WA_Sys_00000002_7B0C223C Aug 10 2011 1:15PM 286685 216605 200 0.03384706 4 NO NULL 286685
(1 row(s) affected)
All density Average Length Columns
------------- -------------- ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
0.0007380074 4 SiteID
(1 row(s) affected)
RANGE_HI_KEY RANGE_ROWS EQ_ROWS DISTINCT_RANGE_ROWS AVG_RANGE_ROWS
------------ ------------- ------------- -------------------- --------------
.
.
.
1397 59.42782 16005.02 5 11.83174
.
.
.
Name Updated Rows Rows Sampled Steps Density Average key length String Index Filter Expression Unfiltered Rows

_WA_Sys_00000006_7B0C223C Aug 10 2011 1:15PM 286685 216605 201 0.7447883 0.8335911 NO NULL 286685
(1 row(s) affected)
All density Average Length Columns
------------- -------------- ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
0.0001065871 0.8335911 DeleteDateTime
(1 row(s) affected)
RANGE_HI_KEY RANGE_ROWS EQ_ROWS DISTINCT_RANGE_ROWS AVG_RANGE_ROWS
----------------------- ------------- ------------- -------------------- --------------
NULL 0 255827 0 1
.
.
.
Run Code Online (Sandbox Code Playgroud)
为我上面运行的查询生成的执行计划没有给我带来任何惊喜.它由一个简单的聚集索引扫描与14282.3估计行和15676行实际.根据我对统计学和成本估算的了解,使用上面的两个直方图,我们可以将SiteID的选择性(16005.02/286685)乘以DeleteDateTime(255827/286685)的选择性,得到复合选择性0.0498187307480119.将行总数(286685)乘以该行给出的优化器完全相同:14282.3.
但这里是我感到困惑的地方.我创建了一个索引,用CREATE INDEX IX_MyTable ON Mytable (SiteID, DeleteDateTime)
它创建自己的统计对象:
Name Updated Rows Rows Sampled Steps Density Average key length String Index Filter Expression Unfiltered Rows

IX_MyTable Aug 10 2011 1:41PM 286685 286685 200 0.02749305 8.822645 NO NULL
286685
(1 row(s) affected)
All density Average Length Columns
------------- -------------- ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
0.0007107321 4 SiteID
7.42611E-05 4.822645 SiteID, DeleteDateTime
3.488149E-06 8.822645 SiteID, DeleteDateTime, GroupClosuresID
(3 row(s) affected)
RANGE_HI_KEY RANGE_ROWS EQ_ROWS DISTINCT_RANGE_ROWS AVG_RANGE_ROWS
------------ ------------- ------------- -------------------- --------------
.
.
.
1397 504 15686 12 42
.
.
.
Run Code Online (Sandbox Code Playgroud)
当我运行与before(SELECT GroupClosuresID FROM MyTable WHERE SiteID = 1397 AND DeleteDateTime IS NULL
)相同的查询时,我仍然会返回15676行,但我的估计行数现在是181.82.
我试过操纵数字来试图找出估计来自哪里,但我无法得到它.我必须假设它与IX_MyTable的密度值有关.
任何帮助将不胜感激.谢谢!!
编辑:这是最后一次查询执行的执行计划.
这个挖了一些!
它是以下产品:
NULL
日期字段中的密度(来自您的第一组统计数据) 255827/286685 = .892363
siteid
新索引中第一个field()的密度:0.0007107321
公式是:
.00071017321 * 286685 = 203.7562
-- est. rows with your value in siteid based on even distribution of values
255827 / 286685 = 0.892363
-- Probability of a NULL across all rows
203.7562 * 0.892363 = 181.8245
Run Code Online (Sandbox Code Playgroud)
我猜测,因为这个实例中的行计数实际上并没有影响任何东西,优化器采用了最简单的路径,只是将概率相乘.
归档时间: |
|
查看次数: |
696 次 |
最近记录: |