使用开始和结束事件日志创建包含每个日志时间之间跨度的表/视图

MJ.*_*MJ. 0 postgresql pivot redshift aggregate-filter

具体来说,我有一个事件表,用于记录用户加入或离开团队的时间。它看起来像下面这样:

-------------------------------------
| user | event  | team | timestamp |
-------------------------------------
| A    | joined | 1    | 2016-1-1  |
| B    | joined | 1    | 2016-1-1  |
| C    | left   | 1    | 2016-1-1  |
| C    | joined | 2    | 2016-1-1  |
| A    | left   | 1    | 2016-1-2  |
| A    | joined | 2    | 2016-1-2  |
| B    | left   | 1    | 2016-1-3  |
| A    | left   | 2    | 2016-1-3  |
-------------------------------------
Run Code Online (Sandbox Code Playgroud)

我需要对其进行重组,以使其看起来如下所示

--------------------------------------
| user | team | joined    | left     |
--------------------------------------
| A    | 1    | 2016-1-1  | 2016-1-2 |
| A    | 2    | 2016-1-2  | 2016-1-3 |
| B    | 1    | 2016-1-1  | 2016-1-3 |
| C    | 1    | null      | 2016-1-1 |
| C    | 2    | 2016-1-1  | null     |
--------------------------------------
Run Code Online (Sandbox Code Playgroud)

我怎样才能做到这一点?

有关更多详细信息,我正在尝试在 Amazon Redshift (PostgreSQL) 中执行此操作

Erw*_*ter 5

假设所有列NOT NULL。并且“left”永远不会早于关联的“joined”。

简单案例

如果用户只能加入一次团队(理想情况下这将通过对 的UNIQUE约束来强制执行("user", team)),那么该解决方案很简单,GROUP BY并且适用于 Redshift 以及大多数任何 RDBMS:

SELECT "user", team
     , min(CASE WHEN event = 'joined' THEN timestamp END) AS joined
     , max(CASE WHEN event = 'left'   THEN timestamp END) AS "left"
FROM   event
GROUP  BY "user", team
ORDER  BY "user", joined NULLS FIRST;
Run Code Online (Sandbox Code Playgroud)

注意NULLS FIRST条款。似乎您想首先对开放式开始进行排序joined IS NULLRedshift 也支持这一点。

除此之外,它是交叉表/数据透视查询的最基本形式。

没那么简单

从您的列名和示例数据来看,它可能并不那么简单。如果用户可以多次加入团队(非重叠),则您必须做更多的工作。您不希望像在此相关答案中那样将多个团队成员资格合并为一行:

相反,您必须以某种方式将相邻的“加入”和“左”行配对。有很多方法...

Postgres 9.4+

对于现代 Postgres,我最喜欢这个:

SELECT "user", team
     , min(timestamp) FILTER (WHERE event = 'joined') AS joined
     , max(timestamp) FILTER (WHERE event = 'left'  ) AS "left"
FROM  (
   SELECT *, count(*) FILTER (WHERE event = 'joined')
                      OVER (PARTITION BY "user", team ORDER BY timestamp) AS ct
   FROM   event
   ) sub
GROUP  BY "user", team, ct
ORDER  BY "user", joined NULLS FIRST;
Run Code Online (Sandbox Code Playgroud)

FILTER在窗口函数和聚合函数中使用聚合子句。相关(带有替代品的链接):

这样我们就可以计算同一个用户加入同一个团队的次数,这样我们就可以对相邻的行进行分组。适用于'joined'开头丢失或'left'结尾丢失。

红移

...不支持新FILTER条款。我们可以用一个普通的 old 代替CASE

SELECT "user", team
     , min(CASE WHEN event = 'joined' THEN timestamp END) AS joined
     , max(CASE WHEN event = 'left'   THEN timestamp END) AS "left"
FROM  (
   SELECT *, count(CASE WHEN event = 'joined' THEN 1 END)
                      OVER (PARTITION BY "user", team ORDER BY timestamp, event) AS ct
   FROM   event
   ) sub
GROUP  BY "user", team, ct
ORDER  BY "user", joined NULLS FIRST;
Run Code Online (Sandbox Code Playgroud)

SQL小提琴。


旁白:即使 Redshift(或 Postgres)允许,您也不应该使用保留字作为标识符。