How to properly apply recursive CTE?

Mar*_*bak 6 sql sql-server recursion common-table-expression

I have a table that contains hospital visits for patients. I am trying to flag visits in which a visits' begin_date overlaps the previous visits' end_date + 90 days. However, the caveat to this is that once a visit is flagged as an overlap visit, that visit should not be used to assess an overlap with another visit. Let me explain with an example.

Table

visitID     patientid    begin_date  end_date
1           23           1/12/2018   1/14/2018
2           23           1/30/2018   2/14/2018
3           23           4/20/2018   4/22/2018
4           23           5/02/2018   5/03/2018
5           23           7/23/2018   7/28/2018
Run Code Online (Sandbox Code Playgroud)

In the example above, the patient had 5 visits. Visit 2's begin_date was in range of visit 1's end_date + 90 days, so visit 2 should be flagged. Once visit 2 is flagged, that row should not be used in the analysis for any future visits. Conceptually, it would be like removing visit 2 and beginning the analysis again.

interim stage (visit 2 is removed, and analysis begins again)

visitID     patientid    begin_date  end_date
1           23           1/12/2018   1/14/2018
3           23           4/20/2018   4/22/2018
4           23           5/02/2018   5/03/2018
5           23           7/23/2018   7/28/2018
Run Code Online (Sandbox Code Playgroud)

So even though visit 3 overlaps with visit 2, since visit 2 has been removed, visit 3 will not be flagged as the previous visit (now visit 1) is more than end_date + 90 days away from visit 3's begin_date. Then, visit 4 should be flagged as it overlaps with a visit that was not flagged (visit 3). So since visit 4 is flagged, then visit 5 will be removed as it's begin_date is in the range of visit 3's end_date + 90 days.

Anticipated output

visitID     patientid    begin_date  end_date    flag
1           23           1/12/2018   1/14/2018   0
2           23           1/30/2018   2/14/2018   1
3           23           4/20/2018   4/22/2018   0
4           23           5/02/2018   5/03/2018   1
5           23           7/23/2018   7/28/2018   1
Run Code Online (Sandbox Code Playgroud)

@gordonlinoff answered a very similar question here, but I am running into issues using recursive CTEs. The difference between the questions is that this question needs to reference another column (end_date), rather than a single date column. Recursive CTEs are still a new concept to me, but I hope this will help solidify the concept.

My attempt to solve this puzzle (piggy backing off of @gordonlinoff):

with vt as (
          select vt.*, row_number() over (partition by patientid order by begin_date) as seqnum
          from visits_table vt
         ),
         cte as (
          select vt.visit, vt.patientid, vt.begin_date, vt.end_date, vt.begin_date as first_begin_date, seqnum
          from vt
          where seqnum = 1
          union all
          select vt.visit, vt.patientid, vt.begin_date, vt.end_date,
                 (case when vt.begin_date > dateadd(day, 90, cte.end_date) then vt.begin_date else cte.end_date end),
                 vt.seqnum
          from cte join
               vt
               on vt.seqnum = cte.seqnum + 1 and vt.patientid = cte.patientid
         )
    select cte.visit, cte.patientid, cte.begin_date, cte.end_date,
           (case when first_begin_date = begin_date then 0 else 1 end) as flag
    from cte
    order by cte.patientid, cte.begin_date;
Run Code Online (Sandbox Code Playgroud)

My edits are improperly referencing the end_date based on the results. However, I cannot find where the comparison between begin_date and end_date should be.

Dataset:

create table visits_table (visit int,patientid int,begin_date date, end_date date);

INSERT INTO visits_table (visit, patientid, begin_date, end_date) VALUES (1,23,'1/12/2018','1/14/2018')
INSERT INTO visits_table (visit, patientid, begin_date, end_date) VALUES (2,23,'1/30/2018','2/14/2018')
INSERT INTO visits_table (visit, patientid, begin_date, end_date) VALUES (3,23,'4/20/2018','4/22/2018')
INSERT INTO visits_table (visit, patientid, begin_date, end_date) VALUES (4,23,'5/02/2018','5/03/2018')
Run Code Online (Sandbox Code Playgroud)

Vla*_*nov 1

我调整了您的样本数据,使访问 5 位于访问 3 的结束日期 + 90 天的范围内。访问 3 结束日期是2018-04-22。如果再加上 90 天,那就是2018-07-21。您在问题中的示例数据已访问 5 个开始日期2018-07-23,该日期与 不重叠2018-07-21。因此,我对此进行了调整,2018-07-20以使这些日期重叠。

create table visits_table (visit int,patientid int,begin_date date, end_date date);

INSERT INTO visits_table (visit, patientid, begin_date, end_date) VALUES 
(1,23,'2018-01-12','2018-01-14'),
(2,23,'2018-01-30','2018-02-14'),
(3,23,'2018-04-20','2018-04-22'),
(4,23,'2018-05-02','2018-05-03'),
(5,23,'2018-07-20','2018-07-28');
Run Code Online (Sandbox Code Playgroud)

您的查询非常接近,您只需计算“上一个”间隔的开始日期和结束日期(first_begin_date, first_end_date)

如果“当前”间隔与“上一个”间隔重叠,则将“上一个”间隔带入当前行。

取消注释下面查询中的行以查看所有中间值。

with
vt
as
(
    select vt.*, row_number() over (partition by patientid order by begin_date) as seqnum
    from visits_table vt
)
,cte
as
(
    select
        vt.visit
        ,vt.patientid
        ,vt.begin_date as first_begin_date
        ,vt.end_date as first_end_date
        ,vt.begin_date
        ,vt.end_date
        ,seqnum
    from vt
    where seqnum = 1

    union all

    select
        vt.visit
        ,vt.patientid
        ,case when vt.begin_date <= dateadd(day, 90, cte.first_end_date)
            then cte.first_begin_date -- they overlap, keep the previous interval
            else vt.begin_date
        end as first_begin_date
        ,case when vt.begin_date <= dateadd(day, 90, cte.first_end_date)
            then cte.first_end_date -- they overlap, keep the previous interval
            else vt.end_date
        end as first_end_date
        ,vt.begin_date
        ,vt.end_date
        ,vt.seqnum
    from
        cte
        inner join vt
            on  vt.seqnum = cte.seqnum + 1
            and vt.patientid = cte.patientid
)
select
    cte.visit
    ,cte.patientid
    ,cte.begin_date
    ,cte.end_date
    ,case when first_begin_date = begin_date 
        then 0 else 1
    end as flag
--  ,DATEADD(day, 90, cte.end_date) AS enddd
--  ,*
from cte
order by cte.patientid, cte.begin_date;
Run Code Online (Sandbox Code Playgroud)

结果

+-------+-----------+------------+------------+------+
| visit | patientid | begin_date |  end_date  | flag |
+-------+-----------+------------+------------+------+
|     1 |        23 | 2018-01-12 | 2018-01-14 |    0 |
|     2 |        23 | 2018-01-30 | 2018-02-14 |    1 |
|     3 |        23 | 2018-04-20 | 2018-04-22 |    0 |
|     4 |        23 | 2018-05-02 | 2018-05-03 |    1 |
|     5 |        23 | 2018-07-20 | 2018-07-28 |    1 |
+-------+-----------+------------+------------+------+
Run Code Online (Sandbox Code Playgroud)