GROUP BY 可能的序列

Dan*_*itz 6 postgresql gaps-and-islands

我有一个包含以下数据的表,使用 Postgres 9.6:

log_id | 序列 | made_at(时间戳)
206480 1 1
206480 1 2
206480 2 3
206480 3 4
206480 1 5
206480 2 6
206480 4 7
206480 5 8
206480 1 9
206480 2 10
206481 1 11
206481 2 12
206481 3 13
206481 4 14

我必须对 ID 进行分组和聚合,以便获得一系列可能的序列。最后,我希望数据看起来像这样:

log_id | 序列
206480 {1,1,2,3}
206480 {1,2,4,5}
206480{1,2}
206481 {1,2,3,4}

在以下情况下,我想要一个新行(带有序列):

  • log_id变化; 或者
  • 下一个序列号低于当前序列号。

还有另一列指定排序(时间戳),但它在另一个表中(我加入它们并使用该时间戳)。为了使事情更容易,我省略了它,但我们可以假设该列名为made_at.

Dav*_*itz 6

select      log_id
           ,array_agg (sequence)

from       (select      log_id 
                       ,sequence
                       ,count (is_restart) over
                        (
                            partition by    log_id 
                            order by        made_at
                        ) as restart_id

            from        (select      made_at
                                    ,log_id 
                                    ,sequence
                                    ,case 
                                         when sequence <
                                              lag (sequence) over
                                              (
                                                  partition by    log_id 
                                                  order by        made_at
                                              ) 
                                         then 1
                                     end            is_restart

                         from        logs
                         ) l
            ) l

group by    log_id      
           ,restart_id

order by    log_id      
           ,restart_id
;
Run Code Online (Sandbox Code Playgroud)
select      log_id
           ,array_agg (sequence)

from       (select      log_id 
                       ,sequence
                       ,count (is_restart) over
                        (
                            partition by    log_id 
                            order by        made_at
                        ) as restart_id

            from        (select      made_at
                                    ,log_id 
                                    ,sequence
                                    ,case 
                                         when sequence <
                                              lag (sequence) over
                                              (
                                                  partition by    log_id 
                                                  order by        made_at
                                              ) 
                                         then 1
                                     end            is_restart

                         from        logs
                         ) l
            ) l

group by    log_id      
           ,restart_id

order by    log_id      
           ,restart_id
;
Run Code Online (Sandbox Code Playgroud)

演练

  • 通过将当前序列与先前序列 ( LAG)进行比较来识别重新启动。

    select      made_at
               ,log_id 
               ,sequence
    
               ,case 
                    when sequence <
                         lag (sequence) over
                         (
                             partition by    log_id 
                             order by        made_at
                         ) 
                    then 1
                end            is_restart
    
    from        logs
    
    Run Code Online (Sandbox Code Playgroud)
    +--------+-----------+
    | log_id | array_agg |
    +--------+-----------+
    | 206480 | {1,1,2,3} |
    +--------+-----------+
    | 206480 | {1,2,4,5} |
    +--------+-----------+
    | 206480 | {1,2}     |
    +--------+-----------+
    | 206481 | {1,2,3,4} |
    +--------+-----------+
    
    Run Code Online (Sandbox Code Playgroud)
  • 执行重新启动 ( is_restart) 的“运行计数”(类似于“运行总数” )。
    属于同一组的行将具有相同的计数(AKA restart_id)。
    中的“Order by”COUNT暗示range between unbounded preceding and current row

    select      log_id 
               ,sequence
               ,count (is_restart) over
                (
                    partition by    log_id 
                    order by        made_at
                ) as group_id
    
    from        (...) l
    
    Run Code Online (Sandbox Code Playgroud)
    select      made_at
               ,log_id 
               ,sequence
    
               ,case 
                    when sequence <
                         lag (sequence) over
                         (
                             partition by    log_id 
                             order by        made_at
                         ) 
                    then 1
                end            is_restart
    
    from        logs
    
    Run Code Online (Sandbox Code Playgroud)
  • 通过集团log_idrestart_id和总序列

    select      log_id
               ,array_agg (sequence)
    
    from       (...) l
    
    group by    log_id      
               ,restart_id
    
    order by    log_id      
               ,restart_id
    ;
    
    Run Code Online (Sandbox Code Playgroud)
    +---------+--------+----------+------------+
    | made_at | log_id | sequence | is_restart |
    +---------+--------+----------+------------+
    | 1       | 206480 | 1        |            |
    +---------+--------+----------+------------+
    | 2       | 206480 | 1        |            |
    +---------+--------+----------+------------+
    | 3       | 206480 | 2        |            |
    +---------+--------+----------+------------+
    | 4       | 206480 | 3        |            |
    +---------+--------+----------+------------+
    | 5       | 206480 | 1        | 1          |
    +---------+--------+----------+------------+
    | 6       | 206480 | 2        |            |
    +---------+--------+----------+------------+
    | 7       | 206480 | 4        |            |
    +---------+--------+----------+------------+
    | 8       | 206480 | 5        |            |
    +---------+--------+----------+------------+
    | 9       | 206480 | 1        | 1          |
    +---------+--------+----------+------------+
    | 10      | 206480 | 2        |            |
    +---------+--------+----------+------------+
    | 11      | 206481 | 1        |            |
    +---------+--------+----------+------------+
    | 12      | 206481 | 2        |            |
    +---------+--------+----------+------------+
    | 13      | 206481 | 3        |            |
    +---------+--------+----------+------------+
    | 14      | 206481 | 4        |            |
    +---------+--------+----------+------------+
    
    Run Code Online (Sandbox Code Playgroud)