如何递归地查找行之间经过 90 天的间隔

Ind*_*ent 20 sql-server sql-server-2014 recursive

这是我的 C# homeworld 中的一种微不足道的任务,但我还没有在 SQL 中实现它,并且更愿意基于集合(没有游标)解决它。结果集应该来自这样的查询。

SELECT SomeId, MyDate, 
    dbo.udfLastHitRecursive(param1, param2, MyDate) as 'Qualifying'
FROM T
Run Code Online (Sandbox Code Playgroud)

它应该如何工作

我将这三个参数发送到 UDF 中。
UDF 在内部使用参数从视图中获取相关 <= 90 天之前的行。
UDF 遍历“MyDate”并返回 1(如果它应该包含在总计算中)。
如果不应该,则返回 0。此处称为“合格”。

udf 会做什么

按日期顺序列出行。计算行之间的天数。结果集中的第一行默认为 Hit = 1。如果差异达到 90,则传递到下一行,直到差距总和为 90 天(必须通过第 90 天)到达时,将 Hit 设置为 1 并将差距重置为 0 . 它也可以代替结果中的行。

                                          |(column by udf, which not work yet)
Date              Calc_date     MaxDiff   | Qualifying
2014-01-01 11:00  2014-01-01    0         | 1
2014-01-03 10:00  2014-01-01    2         | 0
2014-01-04 09:30  2014-01-03    1         | 0
2014-04-01 10:00  2014-01-04    87        | 0
2014-05-01 11:00  2014-04-01    30        | 1
Run Code Online (Sandbox Code Playgroud)

在上表中,MaxDiff 列是与上一行日期的差距。到目前为止,我尝试的问题是我不能忽略上面示例中的倒数第二行。

[编辑]
根据评论,我添加了一个标签并粘贴了我刚刚编译的 udf。虽然,只是一个占位符,不会给出有用的结果。

;WITH cte (someid, otherkey, mydate, cost) AS
(
    SELECT someid, otherkey, mydate, cost
    FROM dbo.vGetVisits
    WHERE someid = @someid AND VisitCode = 3 AND otherkey = @otherkey 
    AND CONVERT(Date,mydate) = @VisitDate

    UNION ALL

    SELECT top 1 e.someid, e.otherkey, e.mydate, e.cost
    FROM dbo.vGetVisits AS E
    WHERE CONVERT(date, e.mydate) 
        BETWEEN DateAdd(dd,-90,CONVERT(Date,@VisitDate)) AND CONVERT(Date,@VisitDate)
        AND e.someid = @someid AND e.VisitCode = 3 AND e.otherkey = @otherkey 
        AND CONVERT(Date,e.mydate) = @VisitDate
        order by e.mydate
)
Run Code Online (Sandbox Code Playgroud)

我有另一个我单独定义的查询,它更接近我需要的查询,但由于我无法在窗口列上计算而被阻止。我还尝试了一个类似的方法,它在 MyDate 上使用 LAG() 提供或多或少相同的输出,并用 datediff 包围。

SELECT
    t.Mydate, t.VisitCode, t.Cost, t.SomeId, t.otherkey, t.MaxDiff, t.DateDiff
FROM 
(
    SELECT *,
        MaxDiff = LAST_VALUE(Diff.Diff)  OVER (
            ORDER BY Diff.Mydate ASC
                ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
    FROM 
    (
        SELECT *,
            Diff =  ISNULL(DATEDIFF(DAY, LAST_VALUE(r.Mydate) OVER (
                        ORDER BY r.Mydate ASC
                            ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING), 
                                r.Mydate),0),
            DateDiff =  ISNULL(LAST_VALUE(r.Mydate) OVER (
                        ORDER BY r.Mydate ASC
                            ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING), 
                                r.Mydate)
        FROM dbo.vGetVisits AS r
        WHERE r.VisitCode = 3 AND r.SomeId = @SomeID AND r.otherkey = @otherkey
    ) AS Diff
) AS t
WHERE t.VisitCode = 3 AND t.SomeId = @SomeId AND t.otherkey = @otherkey
    AND t.Diff <= 90
ORDER BY
    t.Mydate ASC;
Run Code Online (Sandbox Code Playgroud)

Pau*_*ite 25

在我阅读问题时,所需的基本递归算法是:

  1. 返回集合中日期最早的行
  2. 将该日期设置为“当前”
  3. 查找最早日期距当前日期 90 天以上的行
  4. 从第 2 步开始重复,直到找不到更多行

使用递归公用表表达式相对容易实现。

例如,使用以下示例数据(基于问题):

DECLARE @T AS table (TheDate datetime PRIMARY KEY);

INSERT @T (TheDate)
VALUES
    ('2014-01-01 11:00'),
    ('2014-01-03 10:00'),
    ('2014-01-04 09:30'),
    ('2014-04-01 10:00'),
    ('2014-05-01 11:00'),
    ('2014-07-01 09:00'),
    ('2014-07-31 08:00');
Run Code Online (Sandbox Code Playgroud)

递归代码是:

WITH CTE AS
(
    -- Anchor:
    -- Start with the earliest date in the table
    SELECT TOP (1)
        T.TheDate
    FROM @T AS T
    ORDER BY
        T.TheDate

    UNION ALL

    -- Recursive part   
    SELECT
        SQ1.TheDate
    FROM 
    (
        -- Recursively find the earliest date that is 
        -- more than 90 days after the "current" date
        -- and set the new date as "current".
        -- ROW_NUMBER + rn = 1 is a trick to get
        -- TOP in the recursive part of the CTE
        SELECT
            T.TheDate,
            rn = ROW_NUMBER() OVER (
                ORDER BY T.TheDate)
        FROM CTE
        JOIN @T AS T
            ON T.TheDate > DATEADD(DAY, 90, CTE.TheDate)
    ) AS SQ1
    WHERE
        SQ1.rn = 1
)
SELECT 
    CTE.TheDate 
FROM CTE
OPTION (MAXRECURSION 0);
Run Code Online (Sandbox Code Playgroud)

结果是:

???????????????????????????
?         TheDate         ?
???????????????????????????
? 2014-01-01 11:00:00.000 ?
? 2014-05-01 11:00:00.000 ?
? 2014-07-31 08:00:00.000 ?
???????????????????????????
Run Code Online (Sandbox Code Playgroud)

使用索引TheDate作为前导键,执行计划非常有效:

执行计划

您可以选择将其包装在一个函数中并直接针对问题中提到的视图执行它,但我的直觉反对它。通常,当您从视图中选择行到临时表中,在临时表上提供适当的索引,然后应用上述逻辑时,性能会更好。细节取决于视图的细节,但这是我的一般经验。

为了完整性(并由 ypercube 的回答提示),我应该提到我针对此类问题的其他首选解决方案(直到 T-SQL 获得正确的有序集函数)是 SQLCLR 游标(有关该技术的示例,请参阅我的答案here)。这比 T-SQL 游标的性能好得多,并且对于那些具有 .NET 语言技能并能够在其生产环境中运行 SQLCLR 的人来说很方便。在这种情况下,它可能不会比递归解决方案提供太多,因为大部分成本是排序,但值得一提。


Mik*_*son 10

由于这一个 SQL Server 2014 问题,我不妨添加一个“游标”的本机编译存储过程版本。

包含一些数据的源表:

create table T 
(
  TheDate datetime primary key
);

go

insert into T(TheDate) values
('2014-01-01 11:00'),
('2014-01-03 10:00'),
('2014-01-04 09:30'),
('2014-04-01 10:00'),
('2014-05-01 11:00'),
('2014-07-01 09:00'),
('2014-07-31 08:00');
Run Code Online (Sandbox Code Playgroud)

作为存储过程参数的表类型。适当调整bucket_count

create type TType as table
(
  ID int not null primary key nonclustered hash with (bucket_count = 16),
  TheDate datetime not null
) with (memory_optimized = on);
Run Code Online (Sandbox Code Playgroud)

以及一个循环遍历表值参数并收集@R.

create procedure dbo.GetDates
  @T dbo.TType readonly
with native_compilation, schemabinding, execute as owner 
as
begin atomic with (transaction isolation level = snapshot, language = N'us_english', delayed_durability = on)

  declare @R dbo.TType;
  declare @ID int = 0;
  declare @RowsLeft bit = 1;  
  declare @CurDate datetime = '1901-01-01';
  declare @LastDate datetime = '1901-01-01';

  while @RowsLeft = 1
  begin
    set @ID += 1;

    select @CurDate = T.TheDate
    from @T as T
    where T.ID = @ID

    if @@rowcount = 1
    begin
      if datediff(day, @LastDate, @CurDate) > 90
      begin
        insert into @R(ID, TheDate) values(@ID, @CurDate);
        set @LastDate = @CurDate;
      end;
    end
    else
    begin
      set @RowsLeft = 0;
    end

  end;

  select R.TheDate
  from @R as R;
end
Run Code Online (Sandbox Code Playgroud)

用于填充内存优化表变量的代码,该变量用作本机编译存储过程的参数并调用该过程。

declare @T dbo.TType;

insert into @T(ID, TheDate)
select row_number() over(order by T.TheDate),
       T.TheDate
from T;

exec dbo.GetDates @T;
Run Code Online (Sandbox Code Playgroud)

结果:

TheDate
-----------------------
2014-07-31 08:00:00.000
2014-01-01 11:00:00.000
2014-05-01 11:00:00.000
Run Code Online (Sandbox Code Playgroud)

更新:

如果您出于某种原因不需要访问表中的每一行,您可以执行 Paul White 在递归 CTE 中实现的“跳转到下一个日期”版本的等效操作。

数据类型不需要 ID 列,您不应使用哈希索引。

create type TType as table
(
  TheDate datetime not null primary key nonclustered
) with (memory_optimized = on);
Run Code Online (Sandbox Code Playgroud)

并且存储过程使用 aselect top(1) ..来查找下一个值。

create procedure dbo.GetDates
  @T dbo.TType readonly
with native_compilation, schemabinding, execute as owner 
as
begin atomic with (transaction isolation level = snapshot, language = N'us_english', delayed_durability = on)

  declare @R dbo.TType;
  declare @RowsLeft bit = 1;  
  declare @CurDate datetime = '1901-01-01';

  while @RowsLeft = 1
  begin

    select top(1) @CurDate = T.TheDate
    from @T as T
    where T.TheDate > dateadd(day, 90, @CurDate)
    order by T.TheDate;

    if @@rowcount = 1
    begin
      insert into @R(TheDate) values(@CurDate);
    end
    else
    begin
      set @RowsLeft = 0;
    end

  end;

  select R.TheDate
  from @R as R;
end
Run Code Online (Sandbox Code Playgroud)


ype*_*eᵀᴹ 5

使用游标的解决方案。
(首先,一些需要的表和变量)

-- a table to hold the results
DECLARE @cd TABLE
(   TheDate datetime PRIMARY KEY,
    Qualify INT NOT NULL
);

-- some variables
DECLARE
    @TheDate DATETIME,
    @diff INT,
    @Qualify     INT = 0,
    @PreviousCheckDate DATETIME = '1900-01-01 00:00:00' ;
Run Code Online (Sandbox Code Playgroud)

实际光标:

-- declare the cursor
DECLARE c CURSOR
    LOCAL STATIC FORWARD_ONLY READ_ONLY
    FOR
    SELECT TheDate
      FROM T
      ORDER BY TheDate ;

-- using the cursor to fill the @cd table
OPEN c ;

FETCH NEXT FROM c INTO @TheDate ;

WHILE @@FETCH_STATUS = 0
BEGIN
    SET @diff = DATEDIFF(day, @PreviousCheckDate, @Thedate) ;
    SET @Qualify = CASE WHEN @diff > 90 THEN 1 ELSE 0 END ;

    INSERT @cd (TheDate, Qualify)
        SELECT @TheDate, @Qualify ;

    SET @PreviousCheckDate = 
            CASE WHEN @diff > 90 
                THEN @TheDate 
                ELSE @PreviousCheckDate END ;

    FETCH NEXT FROM c INTO @TheDate ;
END

CLOSE c;
DEALLOCATE c;
Run Code Online (Sandbox Code Playgroud)

并得到结果:

-- get the results
SELECT TheDate, Qualify
    FROM @cd
    -- WHERE Qualify = 1        -- optional, to see only the qualifying rows
    ORDER BY TheDate ;
Run Code Online (Sandbox Code Playgroud)

SQLFiddle测试