SQL Server插入性能

sol*_*ljy 1 sql-server clustered-index database-performance insert-statement database-fragmentation

假设我在列上有一个带有聚簇索引的下表(例如,a)

CREATE TABLE Tmp
(
    a int,
    constraint pk_a primary key clustered (a)
)
Run Code Online (Sandbox Code Playgroud)

然后,让我们假设我有两组非常大的行要插入到表中.

  • 第1组)值依次增加(即{0,1,2,3,4,5,6,7,8,9,...,999999997,999999998,99999999})
  • 第2组)值依次递减(即{99999999,999999998,999999997,...,3,2,1,0}

你认为在第一组和第二组中插入值之间会有性能差异吗?如果是这样,为什么?

谢谢

Mar*_*ith 5

SQL Server通常会在插入之前尝试将大型插入排序为聚簇索引顺序.

如果insert的源是表变量,那么它将不考虑基数,除非在填充表变量之后重新编译该语句.如果没有这个,它将假设插入只是一行.

以下脚本演示了三种可能的方案.

  1. 插入源已经完全按正确顺序排列.
  2. 插入源完全按相反的顺序排列.
  3. 插入源完全按相反的顺序OPTION (RECOMPILE)使用,因此SQL Server编译适合插入1,000,000行的计划.

执行计划

计划

第三个有一个sort运算符,首先将插入的值放入聚簇索引顺序.

/*Create three separate identical tables*/
CREATE TABLE Tmp1(a int primary key clustered (a))
CREATE TABLE Tmp2(a int primary key clustered (a))
CREATE TABLE Tmp3(a int primary key clustered (a))

DBCC FREEPROCCACHE;

GO

DECLARE @Source TABLE (N INT PRIMARY KEY (N ASC))

INSERT INTO @Source
SELECT TOP (1000000) ROW_NUMBER() OVER (ORDER BY (SELECT 0)) 
FROM sys.all_columns c1, sys.all_columns c2, sys.all_columns c3

SET STATISTICS TIME ON;

PRINT 'Tmp1'
INSERT INTO Tmp1
SELECT TOP (1000000) N
FROM @Source
ORDER BY N

PRINT 'Tmp2'
INSERT INTO Tmp2
SELECT  TOP (1000000) 1000000 - N
FROM @Source
ORDER BY N

PRINT 'Tmp3'
INSERT INTO Tmp3
SELECT 1000000 - N
FROM @Source
ORDER BY N
OPTION (RECOMPILE)

SET STATISTICS TIME OFF;
Run Code Online (Sandbox Code Playgroud)

验证结果并清理

SELECT object_name(object_id) AS name, 
       page_count, 
       avg_fragmentation_in_percent, 
       fragment_count, 
       avg_fragment_size_in_pages
FROM 
sys.dm_db_index_physical_stats(db_id(), object_id('Tmp1'), 1, NULL, 'DETAILED') 
WHERE  index_level = 0 
UNION ALL 
SELECT object_name(object_id) AS name, 
       page_count, 
       avg_fragmentation_in_percent, 
       fragment_count, 
       avg_fragment_size_in_pages
FROM 
sys.dm_db_index_physical_stats(db_id(), object_id('Tmp2'), 1, NULL, 'DETAILED') 
WHERE  index_level = 0 
UNION ALL 
SELECT object_name(object_id) AS name, 
       page_count, 
       avg_fragmentation_in_percent, 
       fragment_count, 
       avg_fragment_size_in_pages
FROM 
sys.dm_db_index_physical_stats(db_id(), object_id('Tmp3'), 1, NULL, 'DETAILED') 
WHERE  index_level = 0 

DROP TABLE Tmp1, Tmp2, Tmp3
Run Code Online (Sandbox Code Playgroud)

STATISTICS TIME ON 结果

+------+----------+--------------+
|      | CPU Time | Elapsed Time |
+------+----------+--------------+
| Tmp1 | 6718 ms  | 6775 ms      |
| Tmp2 | 7469 ms  | 7240 ms      |
| Tmp3 | 7813 ms  | 9318 ms      |
+------+----------+--------------+
Run Code Online (Sandbox Code Playgroud)

碎片结果

+------+------------+------------------------------+----------------+----------------------------+
| name | page_count | avg_fragmentation_in_percent | fragment_count | avg_fragment_size_in_pages |
+------+------------+------------------------------+----------------+----------------------------+
| Tmp1 |       3345 | 0.448430493                  |             17 | 196.7647059                |
| Tmp2 |       3345 | 99.97010463                  |           3345 | 1                          |
| Tmp3 |       3345 | 0.418535127                  |             16 | 209.0625                   |
+------+------------+------------------------------+----------------+----------------------------+
Run Code Online (Sandbox Code Playgroud)

结论

在这种情况下,他们三个人最终使用完全相同数量的页面.然而Tmp2,其他两个的碎片分别为99.97%和0.4%.插入Tmp3时间最长,因为这需要首先进行额外的排序步骤,但需要根据最小碎片表对未来扫描的好处设置此一次成本.

Tmp2从下面的查询中可以看出原因如此严重碎片化

WITH T AS
(
SELECT TOP 3000 file_id, page_id, a
FROM Tmp2
CROSS APPLY sys.fn_PhysLocCracker(%%physloc%%)
ORDER BY a
)
SELECT file_id, page_id, MIN(a), MAX(a)
FROM T 
group by file_id, page_id
ORDER BY MIN(a)
Run Code Online (Sandbox Code Playgroud)

在零逻辑碎片的情况下,具有下一个最高键值的页面将是文件中的下一个最高页面,但页面与它们应该是的顺序完全相反.

+---------+---------+--------+--------+
| file_id | page_id | Min(a) | Max(a) |
+---------+---------+--------+--------+
|       1 |   26827 |      0 |    143 |
|       1 |   26826 |    144 |    442 |
|       1 |   26825 |    443 |    741 |
|       1 |   26824 |    742 |   1040 |
|       1 |   26823 |   1041 |   1339 |
|       1 |   26822 |   1340 |   1638 |
|       1 |   26821 |   1639 |   1937 |
|       1 |   26820 |   1938 |   2236 |
|       1 |   26819 |   2237 |   2535 |
|       1 |   26818 |   2536 |   2834 |
|       1 |   26817 |   2835 |   2999 |
+---------+---------+--------+--------+
Run Code Online (Sandbox Code Playgroud)

行按降序到达,例如将值2834到2536放入页面26818,然后为2535分配新页面,但这是第26819页而不是第26817页.

插入Tmp2花费更长时间的一个可能原因Tmp1是因为在页面上以完全相反的顺序插入行时每个插入都Tmp2意味着页面上的插槽数组需要重写所有先前的条目以向上移动以腾出空间新品到货.