Dan*_*itz 6 postgresql gaps-and-islands
我有一个包含以下数据的表,使用 Postgres 9.6:
log_id | 序列 | made_at(时间戳) 206480 1 1 206480 1 2 206480 2 3 206480 3 4 206480 1 5 206480 2 6 206480 4 7 206480 5 8 206480 1 9 206480 2 10 206481 1 11 206481 2 12 206481 3 13 206481 4 14
我必须对 ID 进行分组和聚合,以便获得一系列可能的序列。最后,我希望数据看起来像这样:
log_id | 序列
206480 {1,1,2,3}
206480 {1,2,4,5}
206480{1,2}
206481 {1,2,3,4}
在以下情况下,我想要一个新行(带有序列):
log_id变化; 或者还有另一列指定排序(时间戳),但它在另一个表中(我加入它们并使用该时间戳)。为了使事情更容易,我省略了它,但我们可以假设该列名为made_at.
select log_id
,array_agg (sequence)
from (select log_id
,sequence
,count (is_restart) over
(
partition by log_id
order by made_at
) as restart_id
from (select made_at
,log_id
,sequence
,case
when sequence <
lag (sequence) over
(
partition by log_id
order by made_at
)
then 1
end is_restart
from logs
) l
) l
group by log_id
,restart_id
order by log_id
,restart_id
;
Run Code Online (Sandbox Code Playgroud)
select log_id
,array_agg (sequence)
from (select log_id
,sequence
,count (is_restart) over
(
partition by log_id
order by made_at
) as restart_id
from (select made_at
,log_id
,sequence
,case
when sequence <
lag (sequence) over
(
partition by log_id
order by made_at
)
then 1
end is_restart
from logs
) l
) l
group by log_id
,restart_id
order by log_id
,restart_id
;
Run Code Online (Sandbox Code Playgroud)
通过将当前序列与先前序列 ( LAG)进行比较来识别重新启动。
select made_at
,log_id
,sequence
,case
when sequence <
lag (sequence) over
(
partition by log_id
order by made_at
)
then 1
end is_restart
from logs
Run Code Online (Sandbox Code Playgroud)
+--------+-----------+
| log_id | array_agg |
+--------+-----------+
| 206480 | {1,1,2,3} |
+--------+-----------+
| 206480 | {1,2,4,5} |
+--------+-----------+
| 206480 | {1,2} |
+--------+-----------+
| 206481 | {1,2,3,4} |
+--------+-----------+
Run Code Online (Sandbox Code Playgroud)执行重新启动 ( is_restart) 的“运行计数”(类似于“运行总数” )。
属于同一组的行将具有相同的计数(AKA restart_id)。
中的“Order by”COUNT暗示range between unbounded preceding and current row
select log_id
,sequence
,count (is_restart) over
(
partition by log_id
order by made_at
) as group_id
from (...) l
Run Code Online (Sandbox Code Playgroud)
select made_at
,log_id
,sequence
,case
when sequence <
lag (sequence) over
(
partition by log_id
order by made_at
)
then 1
end is_restart
from logs
Run Code Online (Sandbox Code Playgroud)通过集团log_id和restart_id和总序列
select log_id
,array_agg (sequence)
from (...) l
group by log_id
,restart_id
order by log_id
,restart_id
;
Run Code Online (Sandbox Code Playgroud)
+---------+--------+----------+------------+
| made_at | log_id | sequence | is_restart |
+---------+--------+----------+------------+
| 1 | 206480 | 1 | |
+---------+--------+----------+------------+
| 2 | 206480 | 1 | |
+---------+--------+----------+------------+
| 3 | 206480 | 2 | |
+---------+--------+----------+------------+
| 4 | 206480 | 3 | |
+---------+--------+----------+------------+
| 5 | 206480 | 1 | 1 |
+---------+--------+----------+------------+
| 6 | 206480 | 2 | |
+---------+--------+----------+------------+
| 7 | 206480 | 4 | |
+---------+--------+----------+------------+
| 8 | 206480 | 5 | |
+---------+--------+----------+------------+
| 9 | 206480 | 1 | 1 |
+---------+--------+----------+------------+
| 10 | 206480 | 2 | |
+---------+--------+----------+------------+
| 11 | 206481 | 1 | |
+---------+--------+----------+------------+
| 12 | 206481 | 2 | |
+---------+--------+----------+------------+
| 13 | 206481 | 3 | |
+---------+--------+----------+------------+
| 14 | 206481 | 4 | |
+---------+--------+----------+------------+
Run Code Online (Sandbox Code Playgroud)