差距和孤岛:客户端解决方案与 T-SQL 查询

A-K*_*A-K 10 performance sql-server gaps-and-islands query-performance

针对间隙和孤岛的 T-SQL 解决方案能否比在客户端上运行的 C# 解决方案运行得更快?

具体来说,让我们提供一些测试数据:

CREATE TABLE dbo.Numbers
  (
    n INT NOT NULL
          PRIMARY KEY
  ) ; 
GO 

INSERT  INTO dbo.Numbers
        ( n )
VALUES  ( 1 ) ; 
GO 
DECLARE @i INT ; 
SET @i = 0 ; 
WHILE @i < 21 
  BEGIN 
    INSERT  INTO dbo.Numbers
            ( n 
            )
            SELECT  n + POWER(2, @i)
            FROM    dbo.Numbers ; 
    SET @i = @i + 1 ; 
  END ;  
GO

CREATE TABLE dbo.Tasks
  (
    StartedAt SMALLDATETIME NOT NULL ,
    FinishedAt SMALLDATETIME NOT NULL ,
    CONSTRAINT PK_Tasks PRIMARY KEY ( StartedAt, FinishedAt ) ,
    CONSTRAINT UNQ_Tasks UNIQUE ( FinishedAt, StartedAt )
  ) ;
GO

INSERT  INTO dbo.Tasks
        ( StartedAt ,
          FinishedAt
        )
        SELECT  DATEADD(MINUTE, n, '20100101') AS StartedAt ,
                DATEADD(MINUTE, n + 2, '20100101') AS FinishedAt
        FROM    dbo.Numbers
        WHERE   ( n < 500000
                  OR n > 500005
                )
GO
Run Code Online (Sandbox Code Playgroud)

第一组测试数据正好有一个差距:

SELECT  StartedAt ,
        FinishedAt
FROM    dbo.Tasks
WHERE   StartedAt BETWEEN DATEADD(MINUTE, 499999, '20100101')
                  AND     DATEADD(MINUTE, 500006, '20100101')
Run Code Online (Sandbox Code Playgroud)

第二组测试数据有2M -1个gap,每两个相邻的区间之间有一个gap:

TRUNCATE TABLE dbo.Tasks;
GO

INSERT  INTO dbo.Tasks
        ( StartedAt ,
          FinishedAt
        )
        SELECT  DATEADD(MINUTE, 3*n, '20100101') AS StartedAt ,
                DATEADD(MINUTE, 3*n + 2, '20100101') AS FinishedAt
        FROM    dbo.Numbers
        WHERE   ( n < 500000
                  OR n > 500005
                )
GO
Run Code Online (Sandbox Code Playgroud)

目前我正在运行 2008 R2,但非常欢迎 2012 解决方案。我已经发布了我的 C# 解决方案作为答案。

A-K*_*A-K 4

以下 C# 代码解决了该问题:

    var connString =
        "Initial Catalog=MyDb;Data Source=MyServer;Integrated Security=SSPI;Application Name=Benchmarks;";

    var stopWatch = new Stopwatch();
    stopWatch.Start();

    using (var conn = new SqlConnection(connString))
    {
        conn.Open();
        var command = conn.CreateCommand();
        command.CommandText = "dbo.GetAllTaskEvents";
        command.CommandType = CommandType.StoredProcedure;
        var gaps = new List<string>();
        using (var dr = command.ExecuteReader())
        {
            var currentEvents = 0;
            var gapStart = new DateTime();
            var gapStarted = false;
            while (dr.Read())
            {
                var change = dr.GetInt32(1);
                if (change == -1 && currentEvents == 1)
                {
                    gapStart = dr.GetDateTime(0);
                    gapStarted = true;
                }
                else if (change == 1 && currentEvents == 0 && gapStarted)
                {
                    gaps.Add(string.Format("({0},{1})", gapStart, dr.GetDateTime(0)));
                    gapStarted = false;
                }
                currentEvents += change;
            }
        }
        File.WriteAllLines(@"C:\Temp\Gaps.txt", gaps);
    }

    stopWatch.Stop();
    System.Console.WriteLine("Elapsed: " + stopWatch.Elapsed);
Run Code Online (Sandbox Code Playgroud)

此代码调用此存储过程:

CREATE PROCEDURE dbo.GetAllTaskEvents
AS 
  BEGIN ;
    SELECT  EventTime ,
            Change
    FROM    ( SELECT  StartedAt AS EventTime ,
                      1 AS Change
              FROM    dbo.Tasks
              UNION ALL
              SELECT  FinishedAt AS EventTime ,
                      -1 AS Change
              FROM    dbo.Tasks
            ) AS TaskEvents
    ORDER BY EventTime, Change DESC ;
  END ;
GO
Run Code Online (Sandbox Code Playgroud)

它会在以下时间以 2M 间隔查找并打印一个间隙,热缓存:

1 gap: Elapsed: 00:00:01.4852029 00:00:01.4444307 00:00:01.4644152
Run Code Online (Sandbox Code Playgroud)

它在以下时间中以 2M 间隔查找并打印 2M-1 间隙,热缓存:

2M-1 gaps Elapsed: 00:00:08.8576637 00:00:08.9123053 00:00:09.0372344 00:00:08.8545477
Run Code Online (Sandbox Code Playgroud)

这是一个非常简单的解决方案 - 我花了 10 分钟来开发。刚毕业的大学毕业生都能想出这个办法。在数据库方面,执行计划是一个简单的合并连接,使用很少的CPU和内存。

编辑:说实话,我在不同的机器上运行客户端和服务器。