根据组确定日期范围

0 sql t-sql sql-server gaps-and-islands

在编写以下格式的Sql查询时需要帮助:

来源数据:

Field1    Field2  last_update
1234      ABC     2013-01-01
1234      ABC     2013-01-02
1234      ABC     2013-01-03
1234      ABC     2013-01-06
2345      ABC     2013-01-07   -- Field1 is different from prev. row, new group
2345      ABC     2013-01-08
2345      ABC     2013-01-09
1234      ABC     2013-01-10   -- Field1 is different from prev. row, new group
1234      ABC     2013-01-11
2345      ABC     2013-01-12   -- Field1 is different from prev. row, new group
Run Code Online (Sandbox Code Playgroud)

结果集数据应采用以下格式:

Field1  Field2  start_date  stop_date
1234    ABC     2013-01-01  2013-01-06
2345    ABC     2013-01-07  2013-01-09
1234    ABC     2013-01-10  2013-01-11
2345    ABC     2013-01-12  2013-01-12
Run Code Online (Sandbox Code Playgroud)

产生结果的逻辑基于last_update:start_date属于min(last_update)该组并且stop_datemax(last_update).如果Field1与前一行不同,则开始另一个分组.

jpw*_*jpw 6

看起来您想在数据中找到连续的序列.这通常被称为间隙和岛屿问题,一种解决方案是使用该row_number()函数来确定这样的组(岛):

SELECT 
    Field1, 
    Field2, 
    Start_date = MIN(last_update),
    Stop_date = MAX(last_update)
FROM (
    SELECT 
       Field1, Field2, last_update,
       ROW_NUMBER() OVER (ORDER BY last_update) -
       ROW_NUMBER() OVER (PARTITION BY Field1, Field2 ORDER BY last_update) grp
    FROM [Source Data]
    ) A
GROUP BY Field1, Field2, grp
ORDER BY MIN(last_update)
Run Code Online (Sandbox Code Playgroud)

使用您的示例数据,结果如下:

Field1      Field2 Start_date Stop_date
----------- ------ ---------- ----------
1234        ABC    2013-01-01 2013-01-06
2345        ABC    2013-01-07 2013-01-09
1234        ABC    2013-01-10 2013-01-11
2345        ABC    2013-01-12 2013-01-12
Run Code Online (Sandbox Code Playgroud)

该解决方案来自SQL Server MVP Deep Dives系列中的一本书,但我不记得哪一个以及归功于谁.