Mar*_*bak 6 sql sql-server recursion common-table-expression
I have a table that contains hospital visits for patients. I am trying to flag visits in which a visits' begin_date overlaps the previous visits' end_date + 90 days. However, the caveat to this is that once a visit is flagged as an overlap visit, that visit should not be used to assess an overlap with another visit. Let me explain with an example.
Table
visitID patientid begin_date end_date
1 23 1/12/2018 1/14/2018
2 23 1/30/2018 2/14/2018
3 23 4/20/2018 4/22/2018
4 23 5/02/2018 5/03/2018
5 23 7/23/2018 7/28/2018
Run Code Online (Sandbox Code Playgroud)
In the example above, the patient had 5 visits. Visit 2's begin_date was in range of visit 1's end_date + 90 days, so visit 2 should be flagged. Once visit 2 is flagged, that row should not be used in the analysis for any future visits. Conceptually, it would be like removing visit 2 and beginning the analysis again.
interim stage (visit 2 is removed, and analysis begins again)
visitID patientid begin_date end_date
1 23 1/12/2018 1/14/2018
3 23 4/20/2018 4/22/2018
4 23 5/02/2018 5/03/2018
5 23 7/23/2018 7/28/2018
Run Code Online (Sandbox Code Playgroud)
So even though visit 3 overlaps with visit 2, since visit 2 has been removed, visit 3 will not be flagged as the previous visit (now visit 1) is more than end_date + 90 days away from visit 3's begin_date. Then, visit 4 should be flagged as it overlaps with a visit that was not flagged (visit 3). So since visit 4 is flagged, then visit 5 will be removed as it's begin_date is in the range of visit 3's end_date + 90 days.
Anticipated output
visitID patientid begin_date end_date flag
1 23 1/12/2018 1/14/2018 0
2 23 1/30/2018 2/14/2018 1
3 23 4/20/2018 4/22/2018 0
4 23 5/02/2018 5/03/2018 1
5 23 7/23/2018 7/28/2018 1
Run Code Online (Sandbox Code Playgroud)
@gordonlinoff answered a very similar question here, but I am running into issues using recursive CTEs. The difference between the questions is that this question needs to reference another column (end_date), rather than a single date column. Recursive CTEs are still a new concept to me, but I hope this will help solidify the concept.
My attempt to solve this puzzle (piggy backing off of @gordonlinoff):
with vt as (
select vt.*, row_number() over (partition by patientid order by begin_date) as seqnum
from visits_table vt
),
cte as (
select vt.visit, vt.patientid, vt.begin_date, vt.end_date, vt.begin_date as first_begin_date, seqnum
from vt
where seqnum = 1
union all
select vt.visit, vt.patientid, vt.begin_date, vt.end_date,
(case when vt.begin_date > dateadd(day, 90, cte.end_date) then vt.begin_date else cte.end_date end),
vt.seqnum
from cte join
vt
on vt.seqnum = cte.seqnum + 1 and vt.patientid = cte.patientid
)
select cte.visit, cte.patientid, cte.begin_date, cte.end_date,
(case when first_begin_date = begin_date then 0 else 1 end) as flag
from cte
order by cte.patientid, cte.begin_date;
Run Code Online (Sandbox Code Playgroud)
My edits are improperly referencing the end_date based on the results. However, I cannot find where the comparison between begin_date and end_date should be.
Dataset:
create table visits_table (visit int,patientid int,begin_date date, end_date date);
INSERT INTO visits_table (visit, patientid, begin_date, end_date) VALUES (1,23,'1/12/2018','1/14/2018')
INSERT INTO visits_table (visit, patientid, begin_date, end_date) VALUES (2,23,'1/30/2018','2/14/2018')
INSERT INTO visits_table (visit, patientid, begin_date, end_date) VALUES (3,23,'4/20/2018','4/22/2018')
INSERT INTO visits_table (visit, patientid, begin_date, end_date) VALUES (4,23,'5/02/2018','5/03/2018')
Run Code Online (Sandbox Code Playgroud)
我调整了您的样本数据,使访问 5 位于访问 3 的结束日期 + 90 天的范围内。访问 3 结束日期是2018-04-22。如果再加上 90 天,那就是2018-07-21。您在问题中的示例数据已访问 5 个开始日期2018-07-23,该日期与 不重叠2018-07-21。因此,我对此进行了调整,2018-07-20以使这些日期重叠。
create table visits_table (visit int,patientid int,begin_date date, end_date date);
INSERT INTO visits_table (visit, patientid, begin_date, end_date) VALUES
(1,23,'2018-01-12','2018-01-14'),
(2,23,'2018-01-30','2018-02-14'),
(3,23,'2018-04-20','2018-04-22'),
(4,23,'2018-05-02','2018-05-03'),
(5,23,'2018-07-20','2018-07-28');
Run Code Online (Sandbox Code Playgroud)
您的查询非常接近,您只需计算“上一个”间隔的开始日期和结束日期(first_begin_date, first_end_date)。
如果“当前”间隔与“上一个”间隔重叠,则将“上一个”间隔带入当前行。
取消注释下面查询中的行以查看所有中间值。
with
vt
as
(
select vt.*, row_number() over (partition by patientid order by begin_date) as seqnum
from visits_table vt
)
,cte
as
(
select
vt.visit
,vt.patientid
,vt.begin_date as first_begin_date
,vt.end_date as first_end_date
,vt.begin_date
,vt.end_date
,seqnum
from vt
where seqnum = 1
union all
select
vt.visit
,vt.patientid
,case when vt.begin_date <= dateadd(day, 90, cte.first_end_date)
then cte.first_begin_date -- they overlap, keep the previous interval
else vt.begin_date
end as first_begin_date
,case when vt.begin_date <= dateadd(day, 90, cte.first_end_date)
then cte.first_end_date -- they overlap, keep the previous interval
else vt.end_date
end as first_end_date
,vt.begin_date
,vt.end_date
,vt.seqnum
from
cte
inner join vt
on vt.seqnum = cte.seqnum + 1
and vt.patientid = cte.patientid
)
select
cte.visit
,cte.patientid
,cte.begin_date
,cte.end_date
,case when first_begin_date = begin_date
then 0 else 1
end as flag
-- ,DATEADD(day, 90, cte.end_date) AS enddd
-- ,*
from cte
order by cte.patientid, cte.begin_date;
Run Code Online (Sandbox Code Playgroud)
结果
+-------+-----------+------------+------------+------+
| visit | patientid | begin_date | end_date | flag |
+-------+-----------+------------+------------+------+
| 1 | 23 | 2018-01-12 | 2018-01-14 | 0 |
| 2 | 23 | 2018-01-30 | 2018-02-14 | 1 |
| 3 | 23 | 2018-04-20 | 2018-04-22 | 0 |
| 4 | 23 | 2018-05-02 | 2018-05-03 | 1 |
| 5 | 23 | 2018-07-20 | 2018-07-28 | 1 |
+-------+-----------+------------+------------+------+
Run Code Online (Sandbox Code Playgroud)