如果数据在表中连续,则Redshift查询组合结果

ash*_*cse 4 sql database gaps-and-islands amazon-redshift

如果数据是连续的,我需要在redshift中需要组合结果.我有下表,其中user_id,product_id是varchar和login_time,log_out_time是时间戳.

user_id    product_id   login_time                log_out_time
----------------------------------------------------------------------
ashok      facebook     1/1/2017 1:00:00 AM       1/1/2017 2:00:00 AM
ashok      facebook     1/1/2017 2:00:00 AM       1/1/2017 3:00:00 AM
ashok      facebook     1/1/2017 3:00:00 AM       1/1/2017 4:00:00 AM
ashok      linked_in    1/1/2017 5:00:00 AM       1/1/2017 6:00:00 AM
ashok      linked_in    1/1/2017 6:00:00 AM       1/1/2017 7:00:00 AM
ashok      facebook     1/1/2017 8:00:00 AM       1/1/2017 9:00:00 AM
ram        facebook     1/1/2017 9:00:00 AM       1/1/2017 10:00:00 AM
ashok      linked_in    1/1/2017 7:00:00 AM       1/1/2017 8:00:00 AM
Run Code Online (Sandbox Code Playgroud)

如果每个产品的给定user_id的数据是连续的,我需要结合结果.所以我的输出看起来像,

user_id    product_id   login_time                log_out_time
----------------------------------------------------------------------
ashok      facebook     1/1/2017 1:00:00 AM       1/1/2017 4:00:00 AM
ashok      facebook     1/1/2017 8:00:00 AM       1/1/2017 9:00:00 AM
ashok      linked_in    1/1/2017 5:00:00 AM       1/1/2017 8:00:00 AM
ram        facebook     1/1/2017 9:00:00 AM       1/1/2017 10:00:00 AM
Run Code Online (Sandbox Code Playgroud)

我尝试了以下查询,但它没有帮助我,

SELECT user_id, product_id, MIN(login_time), MAX(log_out_time) FROM TABLE_NAME GROUP BY user_id, product_id
Run Code Online (Sandbox Code Playgroud)

上面的查询无法提供我所需的输出,因为它没有逻辑来连续检查数据.我需要在不使用任何自定义函数的情况下对此进行查询,但我可以使用任何redshift内置函数.

Gor*_*off 5

您可以使用它lag()来识别组的起始位置,然后使用累积总和来识别组,然后group by汇总结果:

select user_id, product_id, min(login_time), max(log_out_time)
from (select t.*,
             sum(case when prev_lt = login_time then 0 else 1 end) over
                 (partition by user_id, product_id
                  order by login_time
                  rows between unbounded preceding and current row
                 ) as grp
      from (select t.*,
                   lag(log_out_time) over (partition by user_id, product_id order by login_time) as prev_lt
            from t
           ) t
     ) t
group by user_id, product_id, grp;
Run Code Online (Sandbox Code Playgroud)