在 PostgreSQL 上连续 7 天计算滚动总和

jos*_*gna 12 postgresql aggregate window-functions postgresql-9.4

我需要为每行(每天 1 行)获取 7 天的滚动总和。

例如:

| Date       | Count | 7-Day Rolling Sum |
------------------------------------------
| 2016-02-01 | 1     | 1
| 2016-02-02 | 1     | 2
| 2016-02-03 | 2     | 4
| 2016-02-04 | 2     | 6
| 2016-02-05 | 2     | 8
| 2016-02-06 | 2     | 10
| 2016-02-07 | 2     | 12
| 2016-02-08 | 2     | 13 --> here we start summing from 02-02
| 2016-02-09 | 2     | 14 --> here we start summing from 02-03
| 2016-02-10 | 5     | 17 --> here we start summing from 02-04
Run Code Online (Sandbox Code Playgroud)

我需要在一个查询中使用它,该查询返回具有 7 天滚动总和的行以及总和范围的最后一天的日期。例如,day=2016-02-10,总和为 17。

到目前为止,我有这个,但它没有完全工作:

DO
$do$
DECLARE 
    curr_date date;
    num bigint;
BEGIN
FOR curr_date IN (SELECT date_trunc('day', d)::date FROM generate_series(CURRENT_DATE-31, CURRENT_DATE-1, '1 day'::interval) d)
LOOP 
    SELECT curr_date, SUM(count)
    FROM generate_series (curr_date-8, curr_date-1, '1 day'::interval) d
    LEFT JOIN m.ping AS p ON p.date = d
    LEFT JOIN m.ping_type AS pt ON pt.id = p.ping_type_id
    LEFT JOIN m.ping_frequency AS pf ON pf.id = p.ping_frequency_id
    WHERE
        pt.url_slug = 'active' AND
        pf.url_slug = 'weekly';
END LOOP;
END
$do$;
Run Code Online (Sandbox Code Playgroud)

我正在使用 PostgreSQL 9.4.5。可能有多行具有相同的日期。如果存在缺口(缺少一天),仍将遵循连续 7 天的范围。

hru*_*ske 15

到目前为止,最干净的解决方案是使用窗函数sum具有rows between

with days as (
        SELECT date_trunc('day', d)::date as day
        FROM generate_series(CURRENT_DATE-31, CURRENT_DATE-1, '1 day'::interval) d ),
    counts as (
        select 
            days.day,
            sum((random()*5)::integer) num
        FROM days
        -- left join other tables here to get counts, I'm using random
        group by days.day
    )
select
    day,
    num,
    sum(num) over (order by day ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)
from counts
order by day;
Run Code Online (Sandbox Code Playgroud)

重要的部分是在daysCTE 中生成时间范围并加入它,以免错过任何没有数据的日子。

例子

例如,如果我在过去 14 天内创建了一些包含 20 条记录的测试数据:

SELECT (current_date - ((random()*14)::integer::text || 'days')::interval)::date as day, (random()*7)::integer as num
into test_data from generate_series(1, 20);;
Run Code Online (Sandbox Code Playgroud)

并在此之前添加一个值:

insert into test_data values ((current_date - '25 days'::interval), 5);
Run Code Online (Sandbox Code Playgroud)

然后使用上面的查询:

with days as (
        SELECT date_trunc('day', d)::date as day
        FROM generate_series(CURRENT_DATE-31, CURRENT_DATE-1, '1 day'::interval) d ),
    counts as (
        select 
            days.day,
            sum(t.num) num
        FROM days
        left join test_data t on t.day = days.day
        group by days.day
    )
select
    day,
    num,
    sum(num) over (order by day rows between 6 preceding and current row)
from counts
order by day;
Run Code Online (Sandbox Code Playgroud)

并获得整个月的结果:

    day     | num | sum 
------------+-----+-----
 2016-01-31 |     |    
 2016-02-01 |     |    
 2016-02-02 |     |    
 2016-02-03 |     |    
 2016-02-04 |     |    
 2016-02-05 |     |    
 2016-02-06 |   5 |   5
 2016-02-07 |     |   5
 2016-02-08 |     |   5
 2016-02-09 |     |   5
 2016-02-10 |     |   5
 2016-02-11 |     |   5
 2016-02-12 |     |   5
 2016-02-13 |     |    
 2016-02-14 |     |    
 2016-02-15 |     |    
 2016-02-16 |     |    
 2016-02-17 |     |    
 2016-02-18 |   2 |   2
 2016-02-19 |   5 |   7
 2016-02-20 |     |   7
 2016-02-21 |   4 |  11
 2016-02-22 |  15 |  26
 2016-02-23 |   1 |  27
 2016-02-24 |   1 |  28
 2016-02-25 |   2 |  28
 2016-02-26 |   4 |  27
 2016-02-27 |   9 |  36
 2016-02-28 |   5 |  37
 2016-02-29 |  11 |  33
 2016-03-01 |   5 |  37
(31 rows)
Run Code Online (Sandbox Code Playgroud)