SQL Server锁定超时超过删除循环中的记录

Pau*_*ams 8 sql sql-server locking sql-server-2005

我正在测试一次删除许多记录的进程.它不能TRUNCATE TABLE,因为那里有记录需要留下来.

由于音量,我已将删除分解为类似于此的循环:

-- Do not block if records are locked.
SET LOCK_TIMEOUT 0
-- This process should be chosen as a deadlock victim in the case of a deadlock.
SET DEADLOCK_PRIORITY LOW
SET NOCOUNT ON

DECLARE @Count
SET @Count = 1
WHILE @Count > 0
BEGIN TRY
    BEGIN TRANSACTION -- added per comment below

    DELETE TOP (1000) FROM MyTable WITH (ROWLOCK, READPAST) WHERE MyField = SomeValue
    SET @Count == @@ROWCOUNT

    COMMIT
END TRY
BEGIN CATCH
    exec sp_lock -- added to display the open locks after the timeout
    exec sp_who2 -- shows the active processes

    IF @@TRANCOUNT > 0
        ROLLBACK
    RETURN -- ignoring this error for brevity
END CATCH
Run Code Online (Sandbox Code Playgroud)

MyTable是一个集群表.MyField位于聚簇索引的第一列.它表示记录的逻辑分组,因此MyField = SomeValue通常选择许多记录.只要一次处理一个组,我不关心它们被删除的顺序.此表中没有其他索引.

我添加了ROWLOCK提示,试图避免我们在生产中看到的锁定升级.我添加了READPAST提示以避免删除其他进程锁定的记录.这应该永远不会发生,但我想保证安全.

问题:有时这个循环遇到锁定超时1222"超出锁定请求超时时间",这是唯一运行的东西.

我很肯定在我测试这个过程时,这个系统上没有其他活动,因为它是我自己的开发人员框,没有其他人连接,没有其他进程在其上运行,并且探查器显示没有活动.

我可以在一秒钟之后重新运行相同的脚本,它会从中断的地方继续运行,愉快地删除记录 - 直到下一次锁定超时.

我已经尝试了BEGIN TRY/ BEGIN CATCH忽略1222错误并重试删除,但它会立即再次失败并出现相同的锁定超时错误.如果我在重试之前添加一个短暂的延迟,它也会再次失败.

我假设锁定超时是因为页面拆分之类的东西,但我不确定为什么这会与当前的循环迭代冲突.先前的删除语句应该已经完成​​,我认为这意味着任何页面拆分也已完成.

为什么DELETE循环会对自己造成锁定超时?

是否有一种方法可以避免此锁定超时或检测到恢复是否安全?

这是在SQL Server 2005上.

- 编辑 -

我将Lock:Timeout事件添加到了探查器中.在删除过程中,PAGELOCK超时:

Event Class: Lock:Timeout
TextData:    1:15634  (one example of several)
Mode:        7 - IU
Type:        6 - PAGE
Run Code Online (Sandbox Code Playgroud)

DBCC PAGE报告这些页面超出了主数据库(ID 1)的范围.

- 编辑2 -

我添加了一个BEGIN TRY/ BEGIN CATCHexec sp_lock在catch块中运行了一个.这是我看到的:

spid dbid ObjId      IndId Type Resource Mode Status
19   2    1401108082 1     PAG  1:52841  X    GRANT  (tempdb.dbo.MyTable)
19   2    1401108082 0     TAB           IX   GRANT  (tempdb.dbo.MyTable)
Me   2    1401108082 0     TAB           IX   GRANT  (tempdb.dbo.MyTable)
Me   1    1115151018 0     TAB           IS   GRANT  (master..spt_values)  (?)
Run Code Online (Sandbox Code Playgroud)

SPID 19是SQL Server TASK MANAGER.为什么其中一个任务管理器会在MyTable上获取锁?

Pau*_*ams 6

我找到了答案:我的循环删除与ghost清理过程冲突.

使用尼古拉斯的建议,我添加了一个BEGIN TRANSACTION和一个COMMIT.我将删除循环包装在BEGIN TRY/中BEGIN CATCH.在BEGIN CATCH之前,权ROLLBACK,我跑sp_locksp_who2.(我在上面的问题中添加了代码更改.)

当我的进程被阻止时,我看到了以下输出:

spid   dbid   ObjId       IndId  Type Resource                         Mode     Status
------ ------ ----------- ------ ---- -------------------------------- -------- ------
20     2      1401108082  0      TAB                                   IX       GRANT
20     2      1401108082  1      PAG  1:102368                         X        GRANT

SPID  Status     Login HostName BlkBy DBName Command       CPUTime DiskIO
----  ---------- ----- -------- ----- ------ ------------- ------- ------
20    BACKGROUND sa    .        .     tempdb GHOST CLEANUP 31      0
Run Code Online (Sandbox Code Playgroud)

为了将来参考,当SQL Server删除记录时,它会在它们上设置一点,只是将它们标记为"重影记录".每隔几分钟,就会运行一个名为ghost cleanup的内部进程来回收已完全删除的记录页(即所有记录都是ghost记录).

在这个问题上,在ServerFault上讨论了ghost清理过程.

这是Paul S. Randal对幽灵清理过程的解释.

可以使用跟踪标志禁用ghost清理过程. 但在这种情况下我没有必要这样做.

我最终添加了一个100毫秒的锁等待超时.这会在ghost记录清理过程中偶尔导致锁定等待超时,但这是可以接受的.我还添加了一个循环,重试锁定超时最多5次.通过这两项更改,我的流程现在通常会完成.现在只有在一个非常长的进程推送大量数据来获取表或页面锁定我的进程需要清理的数据时,它才会超时.

编辑2016-07-20

最终代码如下所示:

-- Do not block long if records are locked.
SET LOCK_TIMEOUT 100

-- This process volunteers to be a deadlock victim in the case of a deadlock.
SET DEADLOCK_PRIORITY LOW

DECLARE @Error BIT
SET @Error = 0

DECLARE @ErrMsg VARCHAR(1000)
DECLARE @DeletedCount INT
SELECT @DeletedCount = 0

DECLARE @LockTimeoutCount INT
SET @LockTimeoutCount = 0

DECLARE @ContinueDeleting BIT,
    @LastDeleteSuccessful BIT

SET @ContinueDeleting = 1
SET @LastDeleteSuccessful = 1

WHILE @ContinueDeleting = 1
BEGIN
    DECLARE @RowCount INT
    SET @RowCount = 0

    BEGIN TRY

        BEGIN TRANSACTION

        -- The READPAST below attempts to skip over locked records.
        -- However, it might still cause a lock wait error (1222) if a page or index is locked, because the delete has to modify indexes.
        -- The threshold for row lock escalation to table locks is around 5,000 records,
        -- so keep the deleted number smaller than this limit in case we are deleting a large chunk of data.
        -- Table name, field, and value are all set dynamically in the actual script.
        SET @SQL = N'DELETE TOP (1000) MyTable WITH(ROWLOCK, READPAST) WHERE MyField = SomeValue' 
        EXEC sp_executesql @SQL, N'@ProcGuid uniqueidentifier', @ProcGUID

        SET @RowCount = @@ROWCOUNT

        COMMIT

        SET @LastDeleteSuccessful = 1

        SET @DeletedCount = @DeletedCount + @RowCount
        IF @RowCount = 0
        BEGIN
            SET @ContinueDeleting = 0
        END

    END TRY
    BEGIN CATCH

        IF @@TRANCOUNT > 0
            ROLLBACK

        IF Error_Number() = 1222 -- Lock timeout
        BEGIN

            IF @LastDeleteSuccessful = 1
            BEGIN
                -- If we hit a lock timeout, and we had already deleted something successfully, try again.
                SET @LastDeleteSuccessful = 0
            END
            ELSE
            BEGIN
                -- The last delete failed, too.  Give up for now.  The job will run again shortly.
                SET @ContinueDeleting = 0
            END
        END
        ELSE -- On anything other than a lock timeout, report an error.
        BEGIN       
            SET @ErrMsg = 'An error occurred cleaning up data.  Table: MyTable Column: MyColumn Value: SomeValue.  Message: ' + ERROR_MESSAGE() + ' Error Number: ' + CONVERT(VARCHAR(20), ERROR_NUMBER()) + ' Line: ' + CONVERT(VARCHAR(20), ERROR_LINE())
            PRINT @ErrMsg -- this error message will be included in the SQL Server job history
            SET @Error = 1
            SET @ContinueDeleting = 0
        END

    END CATCH

END

IF @Error <> 0
    RAISERROR('Not all data could be cleaned up.  See previous messages.', 16, 1)
Run Code Online (Sandbox Code Playgroud)