如何识别T-SQL中每个不同成员的多个开始和结束日期范围中的第一个间隙

Vij*_*jay 5 t-sql sql-server-2008 gaps-and-islands

我一直在努力,但没有得到任何结果,截止日期即将来临.此外,如下所示,有超过一百万行.感谢您对以下内容的帮助.

目标:按成员对结果进行分组,并通过组合各个日期范围为每个成员构建连续覆盖范围,这些日期范围重叠或相互连续运行,并且在范围的开始和结束日之间没有中断.

我有以下格式的数据:

MemberCode  -----   ClaimID   -----       StartDate   -----       EndDate
00001   -----       012345   -----       2010-01-15   -----       2010-01-20
00001   -----       012350   -----       2010-01-19   -----       2010-01-22
00001   -----       012352   -----       2010-01-20   -----       2010-01-25
00001   -----       012355   -----       2010-01-26   -----       2010-01-30
00002   -----       012357   -----       2010-01-20   -----       2010-01-25
00002   -----       012359   -----       2010-01-30   -----       2010-02-05
00002   -----       012360   -----       2010-02-04   -----       2010-02-15
00003   -----       012365   -----       2010-02-15   -----       2010-02-30
Run Code Online (Sandbox Code Playgroud)

...

在上文中,成员(00001)是有效成员,因为从2010-01-152010-01-30连续的日期范围(没有间隙).请注意,此会员的索赔ID 012355紧接在索赔ID 012352的结束日期旁边.这仍然有效,因为它形成一个连续的范围.

但是,成员(00002)应该是无效成员,因为在索赔ID 012357的结束日期与索赔ID的开始日期之间存在5天的间隔012359

我想要做的是获取一份仅列出连续日期范围内每一天(每个成员)的成员的列表,每个成员的MIN(开始日期)和最大日期(结束日期)之间没有间隙不同的成员.有差距的会员将被丢弃.

提前致谢.

更新:

我到达了这里.注意:FILLED_DT = Start Date & PresCoverEndDT = End Date

SELECT PresCoverEndDT, FILLED_DT 

FROM 

(

    SELECT DISTINCT FILLED_DT, ROW_NUMBER() OVER (ORDER BY FILLED_DT) RN

    FROM Temp_Claims_PRIOR_STEP_5 T1

    WHERE NOT EXISTS 

            (SELECT * FROM Temp_Claims_PRIOR_STEP_5 T2

            WHERE T1.FILLED_DT > T2.FILLED_DT AND T1.FILLED_DT< T2.PresCoverEndDT 

            AND T1.MBR_KEY = T2.MBR_KEY )

) T1

    JOIN (SELECT DISTINCT PresCoverEndDT, ROW_NUMBER() OVER (ORDER BY PresCoverEndDT) RN

        FROM Temp_Claims_PRIOR_STEP_5 T1

        WHERE NOT EXISTS 

            (SELECT * FROM Temp_Claims_PRIOR_STEP_5 T2

             WHERE T1.PresCoverEndDT > T2.FILLED_DT AND T1.PresCoverEndDT < T2.PresCoverEndDT AND T1.MBR_KEY = T2.MBR_KEY )
) T2

     ON T1.RN - 1 = T2.RN

WHERE   PresCoverEndDT < FILLED_DT 
Run Code Online (Sandbox Code Playgroud)

上面的代码似乎有错误,因为我只得到一行,这也是不正确的.我想要的输出只有1列,如下所示:

Valid_Member_Code

00001

00007

00009

......等等,

Mic*_*uen 5

试试这个:http://www.sqlfiddle.com/#!3/c3365/20

with s as
(
  select *, row_number() over(partition by membercode order by startdate) rn
  from tbl
)
,gaps as
(
select a.membercode, a.startdate, a.enddate, b.startdate as nextstartdate
  ,datediff(d, a.enddate, b.startdate) as gap
from s a
join s b on b.membercode = a.membercode and b.rn = a.rn + 1
)
select membercode 
from gaps
group by membercode
having sum(case when gap <= 1 then 1 end) = count(*);
Run Code Online (Sandbox Code Playgroud)

请在此处查看查询进度:http://www.sqlfiddle.com/#!3/c3365/20


工作原理,将当前结束日期与下一个开始日期进行比较,并检查日期差距:

with s as
(
  select *, row_number() over(partition by membercode order by startdate) rn
  from tbl
)
select a.membercode, a.startdate, a.enddate, b.startdate as nextstartdate
  ,datediff(d, a.enddate, b.startdate) as gap
from s a
join s b on b.membercode = a.membercode and b.rn = a.rn + 1;
Run Code Online (Sandbox Code Playgroud)

输出:

| MEMBERCODE |  STARTDATE |    ENDDATE | NEXTSTARTDATE | GAP |
--------------------------------------------------------------
|          1 | 2010-01-15 | 2010-01-20 |    2010-01-19 |  -1 |
|          1 | 2010-01-19 | 2010-01-22 |    2010-01-20 |  -2 |
|          1 | 2010-01-20 | 2010-01-25 |    2010-01-26 |   1 |
|          2 | 2010-01-20 | 2010-01-25 |    2010-01-30 |   5 |
|          2 | 2010-01-30 | 2010-02-05 |    2010-02-04 |  -1 |
Run Code Online (Sandbox Code Playgroud)

然后检查某个成员是否具有相同的声明数量且其声明总数没有差距:

with s as
(
  select *, row_number() over(partition by membercode order by startdate) rn
  from tbl
)
,gaps as
(
select a.membercode, a.startdate, a.enddate, b.startdate as nextstartdate
  ,datediff(d, a.enddate, b.startdate) as gap
from s a
join s b on b.membercode = a.membercode and b.rn = a.rn + 1
)
select membercode, count(*) as count, sum(case when gap <= 1 then 1 end) as gapless_count
from gaps
group by membercode;
Run Code Online (Sandbox Code Playgroud)

输出:

| MEMBERCODE | COUNT | GAPLESS_COUNT |
--------------------------------------
|          1 |     3 |             3 |
|          2 |     2 |             1 |
Run Code Online (Sandbox Code Playgroud)

最后,过滤他们,成员在他们的声明中没有间隙:

with s as
(
  select *, row_number() over(partition by membercode order by startdate) rn
  from tbl
)
,gaps as
(
select a.membercode, a.startdate, a.enddate, b.startdate as nextstartdate
  ,datediff(d, a.enddate, b.startdate) as gap
from s a
join s b on b.membercode = a.membercode and b.rn = a.rn + 1
)
select membercode 
from gaps
group by membercode
having sum(case when gap <= 1 then 1 end) = count(*);
Run Code Online (Sandbox Code Playgroud)

输出:

| MEMBERCODE |
--------------
|          1 |
Run Code Online (Sandbox Code Playgroud)

请注意,您无需COUNT(*) > 1检测具有2个或更多声明的成员.LEFT JOIN我们使用的不是使用,而是JOIN自动丢弃尚未获得第二次索赔的成员.如果您选择使用LEFT JOIN(与上面相同的输出),这是版本(更长):

with s as
(
select *, row_number() over(partition by membercode order by startdate) rn
from tbl
)
,gaps as
(
select a.membercode, a.startdate, a.enddate, b.startdate as nextstartdate
,datediff(d, a.enddate, b.startdate) as gap
from s a
left join s b on b.membercode = a.membercode and b.rn = a.rn + 1
)
select membercode 
from gaps
group by membercode
having sum(case when gap <= 1 then 1 end) = count(gap)
and count(*) > 1; -- members who have two ore more claims only
Run Code Online (Sandbox Code Playgroud)

以下是在过滤之前查看上述查询的数据的方法:

with s as
(
  select *, row_number() over(partition by membercode order by startdate) rn
  from tbl
)
,gaps as
(
select a.membercode, a.startdate, a.enddate, b.startdate as nextstartdate
  ,datediff(d, a.enddate, b.startdate) as gap
from s a
left join s b on b.membercode = a.membercode and b.rn = a.rn + 1
)
select * from gaps;
Run Code Online (Sandbox Code Playgroud)

输出:

| MEMBERCODE |  STARTDATE |    ENDDATE | NEXTSTARTDATE |    GAP |
-----------------------------------------------------------------
|          1 | 2010-01-15 | 2010-01-20 |    2010-01-19 |     -1 |
|          1 | 2010-01-19 | 2010-01-22 |    2010-01-20 |     -2 |
|          1 | 2010-01-20 | 2010-01-25 |    2010-01-26 |      1 |
|          1 | 2010-01-26 | 2010-01-30 |        (null) | (null) |
|          2 | 2010-01-20 | 2010-01-25 |    2010-01-30 |      5 |
|          2 | 2010-01-30 | 2010-02-05 |    2010-02-04 |     -1 |
|          2 | 2010-02-04 | 2010-02-15 |        (null) | (null) |
|          3 | 2010-02-15 | 2010-03-02 |        (null) | (null) |
Run Code Online (Sandbox Code Playgroud)

编辑要求澄清:

在你的澄清中,你想要包括那些尚未提出第二次索赔的成员,请改为:http://sqlfiddle.com/#!3/c3365/22

with s as
(
select *, row_number() over(partition by membercode order by startdate) rn
from tbl
)
,gaps as
(
select a.membercode, a.startdate, a.enddate, b.startdate as nextstartdate
,datediff(d, a.enddate, b.startdate) as gap
from s a
left join s b on b.membercode = a.membercode and b.rn = a.rn + 1
)
select membercode 
from gaps
group by membercode
having sum(case when gap <= 1 then 1 end) = count(gap)
-- members who have yet to have a second claim are valid too
or count(nextstartdate) = 0; 
Run Code Online (Sandbox Code Playgroud)

输出:

| MEMBERCODE |
--------------
|          1 |
|          3 |
Run Code Online (Sandbox Code Playgroud)

该技术是对成员进行计数nextstartdate,如果他们没有下一个开始日期日期(即count(nextstartdate) = 0),那么他们只是单一声明并且也是有效的,那么只需附加以下OR条件:

or count(nextstartdate) = 0; 
Run Code Online (Sandbox Code Playgroud)

实际上,下面的条件也足够了,我想让查询更加自我记录,因此我建议依靠成员的nextstartdate.这是计算尚未获得第二次索赔的成员的另一个条件:

or count(*) = 1;
Run Code Online (Sandbox Code Playgroud)

顺便说一下,我们还必须改变这种比较:

sum(case when gap <= 1 then 1 end) = count(*)
Run Code Online (Sandbox Code Playgroud)

对此(正如我们现在使用的那样LEFT JOIN):

sum(case when gap <= 1 then 1 end) = count(gap)
Run Code Online (Sandbox Code Playgroud)