Lu4*_*Lu4 9 performance sql-server sql-server-2012 query-performance
这是我第六次问这个问题,也是最短的一次。之前的所有尝试都产生了更类似于博客文章而不是问题本身的东西,但我向您保证,我的问题是真实的,只是它涉及一个大主题,并且没有这个问题包含的所有细节,它将是不清楚我的问题是什么。所以这里...
我有一个数据库,它允许以一种奇特的方式存储数据,并提供我的业务流程所需的几个非标准功能。特点如下:
可能还有其他功能我忘了提及。
所有用户数据都Items
以 JSON 编码字符串 ( ntext
) 的形式存储在表中。所有的数据库操作都通过两个存储过程进行GetLatest
和InsertSnashot
,它们允许类似Git是如何操纵源文件中的数据进行操作。
结果数据在前端链接(JOINed)成完全链接的图,因此在大多数情况下不需要进行数据库查询。
也可以将数据存储在常规 SQL 列中,而不是以 Json 编码形式存储。然而,这增加了整体复杂性压力。
GetLatest
结果与指令形式的数据,请考虑下图进行解释:
该图显示了对单个记录所做更改的演变。图上的箭头显示了编辑所依据的版本(假设用户正在离线更新一些数据,与在线用户进行的更新并行,这种情况会引入冲突,基本上是两个版本的数据而不是一个)。
因此,GetLatest
在以下输入时间跨度内调用将产生以下记录版本:
GetLatest 0, 15 => 1 <= The data is created upon it's first occurance
GetLatest 0, 25 => 2 <= Inserting another version on top of first one overwrites the existing version
GetLatest 0, 30 => 3 <= The overwrite takes place as soon as the data is inserted
GetLatest 0, 45 => 3, 4 <= This is where the conflict is introduced in the system
GetLatest 0, 55 => 4, 5 <= You can still edit all the versions
GetLatest 0, 65 => 4, 6 <= You can still edit all the versions
GetLatest 0, 75 => 4, 6, 7 <= You can also create additional conflicts
GetLatest 0, 85 => 4, 7, 8 <= You can still edit records
GetLatest 0, 95 => 7, 8, 9 <= You can still edit records
GetLatest 0, 105 => 7, 8 <= Inserting a record with `Json` equal to `NULL` means that the record is deleted
GetLatest 0, 115 => 8 <= Deleting the conflicting versions is the only conflict-resolution scenario
GetLatest 0, 125 => 8, X <= The conflict can be based on the version that was already deleted.
GetLatest 0, 135 => 8, Y <= You can delete such version too and both undelete another version on parallel within one Snapshot (or in several Snapshots).
GetLatest 0, 145 => 8 <= You can delete the undeleted versions by inserting NULL.
GetLatest 0, 155 => 8, Z <= You can again undelete twice-deleted versions
GetLatest 0, 165 => 8 <= You can again delete three-times deleted versions
GetLatest 0, 10000 => 8 <= This means that in order to fast-forward view from moment 0 to moment `10000` you just have to expose record 8 to the user.
GetLatest 55, 115 => 8, [Remove 4], [Remove 5] <= At moment 55 there were two versions [4, 5] so in order to fast-forward to moment 115 the user has to delete versions 4 and 5 and introduce version 8. Please note that version 7 is not present in results since at moment 110 it got deleted.
Run Code Online (Sandbox Code Playgroud)
为了GetLatest
支持这种高效的接口中的每个记录包含特殊的服务属性BranchId
,RecoveredOn
,CreatedOn
,UpdatedOnPrev
,UpdatedOnCurr
,UpdatedOnNext
,UpdatedOnNextId
所使用的GetLatest
计算出该记录是否属于充分进入规定的时间跨度GetLatest
参数
为了支持最终的一致性、事务安全性和性能,数据通过特殊的多级过程插入到数据库中。
数据只是插入到数据库中,不能被GetLatest
存储过程查询。
数据可用于GetLatest
存储过程,数据在规范化(即denormalized = 0
)状态下可用。当数据处于规范化状态时,正在计算服务字段BranchId
, RecoveredOn
, CreatedOn
, UpdatedOnPrev
, UpdatedOnCurr
, UpdatedOnNext
,UpdatedOnNextId
这真的很慢。
为了加快处理速度,一旦数据可用于GetLatest
存储过程,就会对其进行非规范化。
InsertSnapshot
调用中得到修复。这部分的代码可以在InsertSnapshot
存储过程的第2 步和第 3 步之间找到。一个新功能(业务需要)迫使我重构特殊Denormalizer
视图,它将所有功能联系在一起并用于GetLatest
和InsertSnapshot
. 在那之后,我开始遇到性能问题。如果最初SELECT * FROM Denormalizer
只在几分之一秒内执行,那么现在处理 10000 条记录需要近 5 分钟。
我不是数据库专家,我花了将近六个月的时间才得出当前的数据库结构。我首先花了两周时间进行重构,然后试图找出导致我的性能问题的根本原因。我就是找不到。我正在提供数据库备份(您可以在此处找到)因为架构(包含所有索引)相当大以适合 SqlFiddle,该数据库还包含我用于测试目的的过时数据(10000 多条记录) . 此外,我正在为经过Denormalizer
重构并变得非常缓慢的视图提供文本:
ALTER VIEW [dbo].[Denormalizer]
AS
WITH Computed AS
(
SELECT currItem.Id,
nextOperation.id AS NextId,
prevOperation.FinishedOn AS PrevComputed,
currOperation.FinishedOn AS CurrComputed,
nextOperation.FinishedOn AS NextComputed
FROM Items currItem
INNER JOIN dbo.Operations AS currOperation ON currItem.OperationId = currOperation.Id
LEFT OUTER JOIN dbo.Items AS prevItem ON currItem.PreviousId = prevItem.Id
LEFT OUTER JOIN dbo.Operations AS prevOperation ON prevItem.OperationId = prevOperation.Id
LEFT OUTER JOIN
(
SELECT MIN(I.id) as id, S.PreviousId, S.FinishedOn
FROM Items I
INNER JOIN
(
SELECT I.PreviousId, MIN(nxt.FinishedOn) AS FinishedOn
FROM dbo.Items I
LEFT OUTER JOIN dbo.Operations AS nxt ON I.OperationId = nxt.Id
GROUP BY I.PreviousId
) AS S ON I.PreviousId = S.PreviousId
GROUP BY S.PreviousId, S.FinishedOn
) AS nextOperation ON nextOperation.PreviousId = currItem.Id
WHERE currOperation.Finished = 1 AND currItem.Denormalized = 0
),
RecursionInitialization AS
(
SELECT currItem.Id,
currItem.PreviousId,
currItem.UUID,
currItem.Json,
currItem.TableName,
currItem.OperationId,
currItem.PermissionId,
currItem.Denormalized,
currItem.Id AS BranchID,
COALESCE (C.PrevComputed, C.CurrComputed) AS CreatedOn,
COALESCE (C.PrevComputed, CAST(0 AS BIGINT)) AS RecoveredOn,
COALESCE (C.PrevComputed, CAST(0 AS BIGINT)) AS UpdatedOnPrev,
C.CurrComputed AS UpdatedOnCurr,
COALESCE (C.NextComputed, CAST(8640000000000000 AS BIGINT)) AS UpdatedOnNext,
C.NextId AS UpdatedOnNextId,
0 AS RecursionLevel
FROM Items AS currItem
INNER JOIN Computed AS C ON currItem.Id = C.Id
WHERE currItem.Denormalized = 0
UNION ALL
SELECT currItem.Id,
currItem.PreviousId,
currItem.UUID,
currItem.Json,
currItem.TableName,
currItem.OperationId,
currItem.PermissionId,
currItem.Denormalized,
currItem.BranchId,
currItem.CreatedOn,
currItem.RecoveredOn,
currItem.UpdatedOnPrev,
currItem.UpdatedOnCurr,
currItem.UpdatedOnNext,
currItem.UpdatedOnNextId,
0 AS RecursionLevel
FROM Items AS currItem
WHERE currItem.Denormalized = 1
),
Recursion AS
(
SELECT *
FROM RecursionInitialization AS currItem
UNION ALL
SELECT currItem.Id,
currItem.PreviousId,
currItem.UUID,
currItem.Json,
currItem.TableName,
currItem.OperationId,
currItem.PermissionId,
currItem.Denormalized,
CASE
WHEN prevItem.UpdatedOnNextId = currItem.Id
THEN prevItem.BranchID
ELSE currItem.Id
END AS BranchID,
prevItem.CreatedOn AS CreatedOn,
CASE
WHEN prevItem.Json IS NULL
THEN CASE
WHEN currItem.Json IS NULL
THEN prevItem.RecoveredOn
ELSE C.CurrComputed
END
ELSE prevItem.RecoveredOn
END AS RecoveredOn,
prevItem.UpdatedOnCurr AS UpdatedOnPrev,
C.CurrComputed AS UpdatedOnCurr,
COALESCE (C.NextComputed, CAST(8640000000000000 AS BIGINT)) AS UpdatedOnNext,
C.NextId,
prevItem.RecursionLevel + 1 AS RecursionLevel
FROM Items currItem
INNER JOIN Computed C ON currItem.Id = C.Id
INNER JOIN Recursion AS prevItem ON currItem.PreviousId = prevItem.Id
WHERE currItem.Denormalized = 0
)
SELECT item.Id,
item.PreviousId,
item.UUID,
item.Json,
item.TableName,
item.OperationId,
item.PermissionId,
item.Denormalized,
item.BranchID,
item.CreatedOn,
item.RecoveredOn,
item.UpdatedOnPrev,
item.UpdatedOnCurr,
item.UpdatedOnNext,
item.UpdatedOnNextId
FROM Recursion AS item
INNER JOIN
(
SELECT Id, MAX(RecursionLevel) AS Recursion
FROM Recursion AS item
GROUP BY Id
) AS nested ON item.Id = nested.Id AND item.RecursionLevel = nested.Recursion
GO
Run Code Online (Sandbox Code Playgroud)
有两种情况需要考虑,非规范化和规范化的情况:
寻找原始备份,是什么让SELECT * FROM Denormalizer
速度如此缓慢,我觉得 Denormalizer 视图的递归部分存在问题,我尝试过限制,denormalized = 1
但我的任何操作都没有影响性能。
运行后UPDATE Items SET Denormalized = 0
它将使GetLatest
和SELECT * FROM Denormalizer
碰上(最初认为是)缓慢的情况下,是有办法加快速度吧,当我们计算服务领域BranchId
,RecoveredOn
,CreatedOn
,UpdatedOnPrev
,UpdatedOnCurr
,UpdatedOnNext
,UpdatedOnNextId
先感谢您
我试图坚持使用标准 SQL,以便将来可以轻松地将查询移植到其他数据库,例如 MySQL/Oracle/SQLite,但如果没有标准 sql 可能有助于我坚持使用特定于数据库的构造。
@Lu4 .. 我投票将这个问题作为“冰山一角”结束,但使用查询提示,您将能够在 1 秒内运行它。此查询可以重构并可以使用CROSS APPLY
,但它将是一个咨询工作,而不是作为问答网站中的答案。
您的查询将在我的具有 4 个 CPU 和 16GB RAM 的服务器上运行 13 分钟以上。
我将您的查询更改为使用OPTION(MERGE JOIN)
,它运行不到 1 秒
set nocount on
set statistics io on
set statistics time on
;WITH Computed AS
(
SELECT currItem.Id,
nextOperation.id AS NextId,
prevOperation.FinishedOn AS PrevComputed,
currOperation.FinishedOn AS CurrComputed,
nextOperation.FinishedOn AS NextComputed
FROM Items currItem
INNER JOIN dbo.Operations AS currOperation ON currItem.OperationId = currOperation.Id
LEFT OUTER JOIN dbo.Items AS prevItem ON currItem.PreviousId = prevItem.Id
LEFT OUTER JOIN dbo.Operations AS prevOperation ON prevItem.OperationId = prevOperation.Id
LEFT OUTER JOIN
(
SELECT MIN(I.id) as id, S.PreviousId, S.FinishedOn
FROM Items I
INNER JOIN
(
SELECT I.PreviousId, MIN(nxt.FinishedOn) AS FinishedOn
FROM dbo.Items I
LEFT OUTER JOIN dbo.Operations AS nxt ON I.OperationId = nxt.Id
GROUP BY I.PreviousId
) AS S ON I.PreviousId = S.PreviousId
GROUP BY S.PreviousId, S.FinishedOn
) AS nextOperation ON nextOperation.PreviousId = currItem.Id
WHERE currOperation.Finished = 1 AND currItem.Denormalized = 0
),
RecursionInitialization AS
(
SELECT currItem.Id,
currItem.PreviousId,
currItem.UUID,
currItem.Json,
currItem.TableName,
currItem.OperationId,
currItem.PermissionId,
currItem.Denormalized,
currItem.Id AS BranchID,
COALESCE (C.PrevComputed, C.CurrComputed) AS CreatedOn,
COALESCE (C.PrevComputed, CAST(0 AS BIGINT)) AS RecoveredOn,
COALESCE (C.PrevComputed, CAST(0 AS BIGINT)) AS UpdatedOnPrev,
C.CurrComputed AS UpdatedOnCurr,
COALESCE (C.NextComputed, CAST(8640000000000000 AS BIGINT)) AS UpdatedOnNext,
C.NextId AS UpdatedOnNextId,
0 AS RecursionLevel
FROM Items AS currItem
INNER JOIN Computed AS C ON currItem.Id = C.Id
WHERE currItem.Denormalized = 0
UNION ALL
SELECT currItem.Id,
currItem.PreviousId,
currItem.UUID,
currItem.Json,
currItem.TableName,
currItem.OperationId,
currItem.PermissionId,
currItem.Denormalized,
currItem.BranchId,
currItem.CreatedOn,
currItem.RecoveredOn,
currItem.UpdatedOnPrev,
currItem.UpdatedOnCurr,
currItem.UpdatedOnNext,
currItem.UpdatedOnNextId,
0 AS RecursionLevel
FROM Items AS currItem
WHERE currItem.Denormalized = 1
),
Recursion AS
(
SELECT *
FROM RecursionInitialization AS currItem
UNION ALL
SELECT currItem.Id,
currItem.PreviousId,
currItem.UUID,
currItem.Json,
currItem.TableName,
currItem.OperationId,
currItem.PermissionId,
currItem.Denormalized,
CASE
WHEN prevItem.UpdatedOnNextId = currItem.Id
THEN prevItem.BranchID
ELSE currItem.Id
END AS BranchID,
prevItem.CreatedOn AS CreatedOn,
CASE
WHEN prevItem.Json IS NULL
THEN CASE
WHEN currItem.Json IS NULL
THEN prevItem.RecoveredOn
ELSE C.CurrComputed
END
ELSE prevItem.RecoveredOn
END AS RecoveredOn,
prevItem.UpdatedOnCurr AS UpdatedOnPrev,
C.CurrComputed AS UpdatedOnCurr,
COALESCE (C.NextComputed, CAST(8640000000000000 AS BIGINT)) AS UpdatedOnNext,
C.NextId,
prevItem.RecursionLevel + 1 AS RecursionLevel
FROM Items currItem
INNER JOIN Computed C ON currItem.Id = C.Id
INNER JOIN Recursion AS prevItem ON currItem.PreviousId = prevItem.Id
WHERE currItem.Denormalized = 0
)
SELECT item.Id,
item.PreviousId,
item.UUID,
item.Json,
item.TableName,
item.OperationId,
item.PermissionId,
item.Denormalized,
item.BranchID,
item.CreatedOn,
item.RecoveredOn,
item.UpdatedOnPrev,
item.UpdatedOnCurr,
item.UpdatedOnNext,
item.UpdatedOnNextId
FROM Recursion AS item
INNER JOIN
(
SELECT Id, MAX(RecursionLevel) AS Recursion
FROM Recursion AS item
GROUP BY Id
) AS nested ON item.Id = nested.Id AND item.RecursionLevel = nested.Recursion
OPTION (MERGE JOIN)
set nocount oFF
set statistics io OFF
set statistics time OFF
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
1836 次 |
最近记录: |