尝试使用Redshift SQL计算累积的不同实体

Ane*_*apu 5 sql amazon-redshift

我试图通过时间序列获得Redshift中不同对象的累积计数.直截了当的是使用COUNT(DISTINCT myfield)OVER(ORDER BY timefield DESC ROWS UNBOUNDED PRECEDING),但Redshift给出了"不支持窗口定义"错误.

例如,下面的代码试图找到从第一周到现在的每周累积的不同用户.但是,我得到"不支持窗口功能"错误.

SELECT user_time.weeks_ago, 
       COUNT(distinct user_time.user_id) OVER
            (ORDER BY weeks_ago desc ROWS UNBOUNDED PRECEDING) as count
FROM   (SELECT FLOOR(EXTRACT(DAY FROM sysdate - ev.time) / 7) AS weeks_ago,
               ev.user_id as user_id
        FROM events as ev
        WHERE ev.action='some_user_action') as user_time
Run Code Online (Sandbox Code Playgroud)

目标是构建执行操作的唯一用户的累积时间序列.关于如何做到这一点的任何想法?

alb*_*lin 5

以下是如何将其应用到此处引用的示例,另外我还添加了另一行复制“2015-01-01”的“表”,以演示如何计算不同值。

该示例的作者对解决方案的看法是错误的,但我只是使用他的示例。

create table public.test
(
  "date" date,
  item varchar(8),
  measure int
)

insert into public.test
    values
      ('2015-01-01', 'table',   12),
      ('2015-01-01', 'table',   120),
      ('2015-01-01', 'chair',   51),
      ('2015-01-01', 'lamp',    8),
      ('2015-01-02', 'table',   17),
      ('2015-01-02', 'chair',   72),
      ('2015-01-02', 'lamp',    23),
      ('2015-01-02', 'bed',     1),
      ('2015-01-02', 'dresser', 2),
      ('2015-01-03', 'bed',     1);

WITH x AS (
    SELECT
      *,
      DENSE_RANK()
      OVER (PARTITION BY date
        ORDER BY item) AS dense_rank
    FROM public.test
)
SELECT
  "date",
  item,
  measure,
  max(dense_rank)
  OVER (PARTITION BY "date")
FROM x
ORDER BY 1;
Run Code Online (Sandbox Code Playgroud)

CTE 获取每个日期每个项目的密集排名,然后主查询获取每个日期该密集排名的最大值,即每个日期项目的不同计数。

您需要密集排名而不是直接排名来计算不同值。


Ane*_*apu 3

找到了答案。事实证明,这个技巧是一组嵌套的子查询,内部的子查询计算每个用户第一次操作的时间。中间的子查询计算每个时间段的总操作数,最后的外部查询执行时间序列上的累积和:

(SELECT engaged_per_week.week as week,
       SUM(engaged_per_week.total) over (order by engaged_per_week.week DESC ROWS UNBOUNDED PRECEDING) as total
 FROM 
    -- COUNT OF FIRST TIME ENGAGEMENTS PER WEEK
    (SELECT engaged.first_week AS week,
            count(engaged.first_week) AS total
    FROM
       -- WEEK OF FIRST ENGAGEMENT FOR EACH USER
       (SELECT  MAX(FLOOR(EXTRACT(DAY FROM sysdate - ev.time) / 7)) as first_week
        FROM     events ev
        WHERE    ev.name='some_user_action'
        GROUP BY ev.user_id) AS engaged

    GROUP BY week) as engaged_per_week
ORDER BY week DESC) as cumulative_engaged
Run Code Online (Sandbox Code Playgroud)