递归sql查询性能问题

Lu4*_*Lu4 9 performance sql-server sql-server-2012 query-performance

这是我第六次问这个问题,也是最短的一次。之前的所有尝试都产生了更类似于博客文章而不是问题本身的东西,但我向您保证,我的问题是真实的,只是它涉及一个大主题,并且没有这个问题包含的所有细节,它将是不清楚我的问题是什么。所以这里...

抽象的

我有一个数据库,它允许以一种奇特的方式存储数据,并提供我的业务流程所需的几个非标准功能。特点如下:

  1. 通过仅插入方法实现的非破坏性和非阻塞更新/删除,允许数据恢复和自动记录(每个更改都与进行更改的用户相关联)
  2. 多版本数据(同一数据可以有多个版本)
  3. 数据库级权限
  4. 与 ACID 规范和事务安全的创建/更新/删除的最终一致性
  5. 能够将当前的数据视图倒回或快进到任何时间点。

可能还有其他功能我忘了提及。

数据库结构

所有用户数据都Items以 JSON 编码字符串 ( ntext) 的形式存储在表中。所有的数据库操作都通过两个存储过程进行GetLatestInsertSnashot,它们允许类似Git是如何操纵源文件中的数据进行操作。

结果数据在前端链接(JOINed)成完全链接的图,因此在大多数情况下不需要进行数据库查询。

也可以将数据存储在常规 SQL 列中,而不是以 Json 编码形式存储。然而,这增加了整体复杂性压力。

读取数据

GetLatest结果与指令形式的数据,请考虑下图进行解释:

结构图

该图显示了对单个记录所做更改的演变。图上的箭头显示了编辑所依据的版本(假设用户正在离线更新一些数据,与在线用户进行的更新并行,这种情况会引入冲突,基本上是两个版本的数据而不是一个)。

因此,GetLatest在以下输入时间跨度内调用将产生以下记录版本:

GetLatest 0, 15  => 1       <= The data is created upon it's first occurance
GetLatest 0, 25  => 2       <= Inserting another version on top of first one overwrites the existing version
GetLatest 0, 30  => 3       <= The overwrite takes place as soon as the data is inserted
GetLatest 0, 45  => 3, 4    <= This is where the conflict is introduced in the system
GetLatest 0, 55  => 4, 5    <= You can still edit all the versions
GetLatest 0, 65  => 4, 6    <= You can still edit all the versions
GetLatest 0, 75  => 4, 6, 7 <= You can also create additional conflicts
GetLatest 0, 85  => 4, 7, 8 <= You can still edit records
GetLatest 0, 95  => 7, 8, 9 <= You can still edit records
GetLatest 0, 105 => 7, 8    <= Inserting a record with `Json` equal to `NULL` means that the record is deleted
GetLatest 0, 115 => 8       <= Deleting the conflicting versions is the only conflict-resolution scenario
GetLatest 0, 125 => 8, X    <= The conflict can be based on the version that was already deleted.
GetLatest 0, 135 => 8, Y    <= You can delete such version too and both undelete another version on parallel within one Snapshot (or in several Snapshots).
GetLatest 0, 145 => 8       <= You can delete the undeleted versions by inserting NULL.
GetLatest 0, 155 => 8, Z    <= You can again undelete twice-deleted versions
GetLatest 0, 165 => 8       <= You can again delete three-times deleted versions
GetLatest 0, 10000 => 8     <= This means that in order to fast-forward view from moment 0 to moment `10000` you just have to expose record 8 to the user.
GetLatest 55, 115  => 8, [Remove 4], [Remove 5] <= At moment 55 there were two versions [4, 5] so in order to fast-forward to moment 115 the user has to delete versions 4 and 5 and introduce version 8. Please note that version 7 is not present in results since at moment 110 it got deleted.
Run Code Online (Sandbox Code Playgroud)

为了GetLatest支持这种高效的接口中的每个记录包含特殊的服务属性BranchIdRecoveredOnCreatedOnUpdatedOnPrevUpdatedOnCurrUpdatedOnNextUpdatedOnNextId所使用的GetLatest计算出该记录是否属于充分进入规定的时间跨度GetLatest参数

插入数据

为了支持最终的一致性、事务安全性和性能,数据通过特殊的多级过程插入到数据库中。

  1. 数据只是插入到数据库中,不能被GetLatest存储过程查询。

  2. 数据可用于GetLatest存储过程,数据在规范化(即denormalized = 0)状态下可用。当数据处于规范化状态时,正在计算服务字段BranchId, RecoveredOn, CreatedOn, UpdatedOnPrev, UpdatedOnCurr, UpdatedOnNext,UpdatedOnNextId这真的很慢。

  3. 为了加快处理速度,一旦数据可用于GetLatest存储过程,就会对其进行非规范化。

    • 由于步骤 1、2、3 在不同的事务中进行,因此在每个操作的中间可能会发生硬件故障。使数据处于中间状态。这种情况是正常的,即使发生了,数据也会在后续InsertSnapshot调用中得到修复。这部分的代码可以在InsertSnapshot存储过程的第2 步和第 3 步之间找到。

问题

一个新功能(业务需要)迫使我重构特殊Denormalizer视图,它将所有功能联系在一起并用于GetLatestInsertSnapshot. 在那之后,我开始遇到性能问题。如果最初SELECT * FROM Denormalizer只在几分之一秒内执行,那么现在处理 10000 条记录需要近 5 分钟。

我不是数据库专家,我花了将近六个月的时间才得出当前的数据库结构。我首先花了两周时间进行重构,然后试图找出导致我的性能问题的根本原因。我就是找不到。我正在提供数据库备份(您可以在此处找到)因为架构(包含所有索引)相当大以适合 SqlFiddle,该数据库还包含我用于测试目的的过时数据(10000 多条记录) . 此外,我正在为经过Denormalizer重构并变得非常缓慢的视图提供文本:

ALTER VIEW [dbo].[Denormalizer]
AS
WITH Computed AS
(
    SELECT  currItem.Id,
            nextOperation.id AS NextId,
            prevOperation.FinishedOn AS PrevComputed,
            currOperation.FinishedOn AS CurrComputed,
            nextOperation.FinishedOn AS NextComputed

    FROM Items currItem 
    INNER JOIN dbo.Operations AS currOperation ON currItem.OperationId = currOperation.Id

    LEFT OUTER JOIN dbo.Items AS prevItem ON currItem.PreviousId = prevItem.Id
    LEFT OUTER JOIN dbo.Operations AS prevOperation ON prevItem.OperationId = prevOperation.Id 
    LEFT OUTER JOIN
    (
        SELECT MIN(I.id) as id, S.PreviousId, S.FinishedOn
        FROM Items I
        INNER JOIN
        (
            SELECT I.PreviousId, MIN(nxt.FinishedOn) AS FinishedOn
            FROM dbo.Items I
            LEFT OUTER JOIN dbo.Operations AS nxt ON I.OperationId = nxt.Id
            GROUP BY I.PreviousId
        ) AS S ON I.PreviousId = S.PreviousId 
        GROUP BY S.PreviousId, S.FinishedOn
    ) AS nextOperation ON nextOperation.PreviousId = currItem.Id

    WHERE currOperation.Finished = 1 AND currItem.Denormalized = 0
),

RecursionInitialization AS
(
    SELECT  currItem.Id,
            currItem.PreviousId,
            currItem.UUID,
            currItem.Json,
            currItem.TableName,
            currItem.OperationId,
            currItem.PermissionId,
            currItem.Denormalized,
            currItem.Id AS BranchID,
            COALESCE (C.PrevComputed, C.CurrComputed) AS CreatedOn,
            COALESCE (C.PrevComputed, CAST(0 AS BIGINT)) AS RecoveredOn,
            COALESCE (C.PrevComputed, CAST(0 AS BIGINT)) AS UpdatedOnPrev,
            C.CurrComputed AS UpdatedOnCurr,
            COALESCE (C.NextComputed, CAST(8640000000000000 AS BIGINT)) AS UpdatedOnNext,
            C.NextId AS UpdatedOnNextId,

            0 AS RecursionLevel

    FROM Items AS currItem
    INNER JOIN Computed AS C ON currItem.Id = C.Id
    WHERE currItem.Denormalized = 0

    UNION ALL

    SELECT  currItem.Id,
            currItem.PreviousId,
            currItem.UUID,
            currItem.Json,
            currItem.TableName,
            currItem.OperationId,
            currItem.PermissionId,
            currItem.Denormalized,
            currItem.BranchId,
            currItem.CreatedOn,
            currItem.RecoveredOn,
            currItem.UpdatedOnPrev,
            currItem.UpdatedOnCurr,
            currItem.UpdatedOnNext,
            currItem.UpdatedOnNextId,

            0 AS RecursionLevel

    FROM Items AS currItem
    WHERE currItem.Denormalized = 1
),
Recursion AS
(
    SELECT *
    FROM RecursionInitialization AS currItem

    UNION ALL

    SELECT  currItem.Id,
            currItem.PreviousId,
            currItem.UUID,
            currItem.Json,
            currItem.TableName,
            currItem.OperationId,
            currItem.PermissionId,
            currItem.Denormalized,

            CASE
                WHEN prevItem.UpdatedOnNextId = currItem.Id
                THEN prevItem.BranchID
                ELSE currItem.Id
            END AS BranchID,

            prevItem.CreatedOn AS CreatedOn,

            CASE
                WHEN prevItem.Json IS NULL
                THEN CASE
                            WHEN currItem.Json IS NULL
                            THEN prevItem.RecoveredOn
                            ELSE C.CurrComputed
                        END
                ELSE prevItem.RecoveredOn
            END AS RecoveredOn,

            prevItem.UpdatedOnCurr AS UpdatedOnPrev,

            C.CurrComputed AS UpdatedOnCurr,

            COALESCE (C.NextComputed, CAST(8640000000000000 AS BIGINT)) AS UpdatedOnNext,

            C.NextId,

            prevItem.RecursionLevel + 1 AS RecursionLevel
    FROM Items currItem
    INNER JOIN Computed C ON currItem.Id = C.Id
    INNER JOIN Recursion AS prevItem ON currItem.PreviousId = prevItem.Id
    WHERE currItem.Denormalized = 0
)
SELECT  item.Id,
        item.PreviousId,
        item.UUID,
        item.Json,
        item.TableName,
        item.OperationId,
        item.PermissionId,
        item.Denormalized,
        item.BranchID,
        item.CreatedOn,
        item.RecoveredOn,
        item.UpdatedOnPrev,
        item.UpdatedOnCurr,
        item.UpdatedOnNext,
        item.UpdatedOnNextId

FROM Recursion AS item
INNER JOIN
(
    SELECT Id, MAX(RecursionLevel) AS Recursion
    FROM Recursion AS item
    GROUP BY Id
) AS nested ON item.Id = nested.Id AND item.RecursionLevel = nested.Recursion
GO
Run Code Online (Sandbox Code Playgroud)

问题

有两种情况需要考虑,非规范化和规范化的情况:

  1. 寻找原始备份,是什么让SELECT * FROM Denormalizer速度如此缓慢,我觉得 Denormalizer 视图的递归部分存在问题,我尝试过限制,denormalized = 1但我的任何操作都没有影响性能。

  2. 运行后UPDATE Items SET Denormalized = 0它将使GetLatestSELECT * FROM Denormalizer碰上(最初认为是)缓慢的情况下,是有办法加快速度吧,当我们计算服务领域BranchIdRecoveredOnCreatedOnUpdatedOnPrevUpdatedOnCurrUpdatedOnNextUpdatedOnNextId

先感谢您

聚苯乙烯

我试图坚持使用标准 SQL,以便将来可以轻松地将查询移植到其他数据库,例如 MySQL/Oracle/SQLite,但如果没有标准 sql 可能有助于我坚持使用特定于数据库的构造。

Kin*_*hah 9

@Lu4 .. 我投票将这个问题作为“冰山一角”结束,但使用查询提示,您将能够在 1 秒内运行它。此查询可以重构并可以使用CROSS APPLY,但它将是一个咨询工作,而不是作为问答网站中的答案。

您的查询将在我的具有 4 个 CPU 和 16GB RAM 的服务器上运行 13 分钟以上。

在此处输入图片说明

我将您的查询更改为使用OPTION(MERGE JOIN)它运行不到 1 秒

set nocount on 
set statistics io on
set statistics time on
;WITH Computed AS
(
    SELECT  currItem.Id,
            nextOperation.id AS NextId,
            prevOperation.FinishedOn AS PrevComputed,
            currOperation.FinishedOn AS CurrComputed,
            nextOperation.FinishedOn AS NextComputed

    FROM Items currItem 
    INNER JOIN dbo.Operations AS currOperation ON currItem.OperationId = currOperation.Id

    LEFT OUTER JOIN dbo.Items AS prevItem ON currItem.PreviousId = prevItem.Id
    LEFT OUTER JOIN dbo.Operations AS prevOperation ON prevItem.OperationId = prevOperation.Id 
    LEFT OUTER JOIN
    (
        SELECT MIN(I.id) as id, S.PreviousId, S.FinishedOn
        FROM Items I
        INNER JOIN
        (
            SELECT I.PreviousId, MIN(nxt.FinishedOn) AS FinishedOn
            FROM dbo.Items I
            LEFT OUTER JOIN dbo.Operations AS nxt ON I.OperationId = nxt.Id
            GROUP BY I.PreviousId
        ) AS S ON I.PreviousId = S.PreviousId 
        GROUP BY S.PreviousId, S.FinishedOn
    ) AS nextOperation ON nextOperation.PreviousId = currItem.Id

    WHERE currOperation.Finished = 1 AND currItem.Denormalized = 0
),

RecursionInitialization AS
(
    SELECT  currItem.Id,
            currItem.PreviousId,
            currItem.UUID,
            currItem.Json,
            currItem.TableName,
            currItem.OperationId,
            currItem.PermissionId,
            currItem.Denormalized,
            currItem.Id AS BranchID,
            COALESCE (C.PrevComputed, C.CurrComputed) AS CreatedOn,
            COALESCE (C.PrevComputed, CAST(0 AS BIGINT)) AS RecoveredOn,
            COALESCE (C.PrevComputed, CAST(0 AS BIGINT)) AS UpdatedOnPrev,
            C.CurrComputed AS UpdatedOnCurr,
            COALESCE (C.NextComputed, CAST(8640000000000000 AS BIGINT)) AS UpdatedOnNext,
            C.NextId AS UpdatedOnNextId,

            0 AS RecursionLevel

    FROM Items AS currItem
    INNER JOIN Computed AS C ON currItem.Id = C.Id
    WHERE currItem.Denormalized = 0

    UNION ALL

    SELECT  currItem.Id,
            currItem.PreviousId,
            currItem.UUID,
            currItem.Json,
            currItem.TableName,
            currItem.OperationId,
            currItem.PermissionId,
            currItem.Denormalized,
            currItem.BranchId,
            currItem.CreatedOn,
            currItem.RecoveredOn,
            currItem.UpdatedOnPrev,
            currItem.UpdatedOnCurr,
            currItem.UpdatedOnNext,
            currItem.UpdatedOnNextId,

            0 AS RecursionLevel

    FROM Items AS currItem
    WHERE currItem.Denormalized = 1
),
Recursion AS
(
    SELECT *
    FROM RecursionInitialization AS currItem

    UNION ALL

    SELECT  currItem.Id,
            currItem.PreviousId,
            currItem.UUID,
            currItem.Json,
            currItem.TableName,
            currItem.OperationId,
            currItem.PermissionId,
            currItem.Denormalized,

            CASE
                WHEN prevItem.UpdatedOnNextId = currItem.Id
                THEN prevItem.BranchID
                ELSE currItem.Id
            END AS BranchID,

            prevItem.CreatedOn AS CreatedOn,

            CASE
                WHEN prevItem.Json IS NULL
                THEN CASE
                            WHEN currItem.Json IS NULL
                            THEN prevItem.RecoveredOn
                            ELSE C.CurrComputed
                        END
                ELSE prevItem.RecoveredOn
            END AS RecoveredOn,

            prevItem.UpdatedOnCurr AS UpdatedOnPrev,

            C.CurrComputed AS UpdatedOnCurr,

            COALESCE (C.NextComputed, CAST(8640000000000000 AS BIGINT)) AS UpdatedOnNext,

            C.NextId,

            prevItem.RecursionLevel + 1 AS RecursionLevel
    FROM Items currItem
    INNER JOIN Computed C ON currItem.Id = C.Id
    INNER JOIN Recursion AS prevItem ON currItem.PreviousId = prevItem.Id
    WHERE currItem.Denormalized = 0
)
SELECT  item.Id,
        item.PreviousId,
        item.UUID,
        item.Json,
        item.TableName,
        item.OperationId,
        item.PermissionId,
        item.Denormalized,
        item.BranchID,
        item.CreatedOn,
        item.RecoveredOn,
        item.UpdatedOnPrev,
        item.UpdatedOnCurr,
        item.UpdatedOnNext,
        item.UpdatedOnNextId

FROM Recursion AS item
INNER JOIN
(
    SELECT Id, MAX(RecursionLevel) AS Recursion
    FROM Recursion AS item
    GROUP BY Id
) AS nested ON item.Id = nested.Id AND item.RecursionLevel = nested.Recursion
OPTION (MERGE JOIN)

set nocount oFF 
set statistics io OFF
set statistics time OFF
Run Code Online (Sandbox Code Playgroud)

在此处输入图片说明

请注意,您不能在视图中使用查询提示,因此您必须找出将您的视图作为 SP 或任何解决方法的替代方法