Postgres中有效的时间序列查询

The*_*ous 8 sql postgresql

我的PG数据库中有一个表,看起来有点像这样:

id | widget_id | for_date | score |
Run Code Online (Sandbox Code Playgroud)

每个引用的小部件都有很多这些项目.每个小部件每天总是1个,但是存在差距.

我想得到的结果是包含自X以来每个日期的所有小部件.日期通过生成系列引入:

 SELECT date.date::date
   FROM generate_series('2012-01-01'::timestamp with time zone,'now'::text::date::timestamp with time zone, '1 day') date(date)
 ORDER BY date.date DESC;
Run Code Online (Sandbox Code Playgroud)

如果没有给定widget_id的日期条目,我想使用前一个.所以说小工具1337在2012-05-10没有条目,但在2012-05-08,那么我希望结果集在2012-05-10也显示2012-05-08条目:

Actual data:
widget_id | for_date   | score
1312      | 2012-05-07 | 20
1337      | 2012-05-07 | 12
1337      | 2012-05-08 | 41
1337      | 2012-05-11 | 500

Desired output based on generate series:
widget_id | for_date   | score
1336      | 2012-05-07 | 20
1337      | 2012-05-07 | 12
1336      | 2012-05-08 | 20
1337      | 2012-05-08 | 41
1336      | 2012-05-09 | 20
1337      | 2012-05-09 | 41
1336      | 2012-05-10 | 20
1337      | 2012-05-10 | 41
1336      | 2012-05-11 | 20
1337      | 2012-05-11 | 500
Run Code Online (Sandbox Code Playgroud)

最后我想把它归结为一个视图,所以我每天都有一致的数据集,我可以轻松查询.

编辑:使样本数据和预期结果集更清晰

Clo*_*eto 8

SQL小提琴

select
    widget_id,
    for_date,
    case
        when score is not null then score
        else first_value(score) over (partition by widget_id, c order by for_date)
        end score
from (
    select
        a.widget_id,
        a.for_date,
        s.score,
        count(score) over(partition by a.widget_id order by a.for_date) c
    from (
        select widget_id, g.d::date for_date
        from (
            select distinct widget_id
            from score
            ) s
            cross join
            generate_series(
                (select min(for_date) from score),
                (select max(for_date) from score),
                '1 day'
            ) g(d)
        ) a
        left join
        score s on a.widget_id = s.widget_id and a.for_date = s.for_date
) s
order by widget_id, for_date
Run Code Online (Sandbox Code Playgroud)


Erw*_*ter 7

首先,您可以使用更简单的generate_series()表表达式.相当于你的(除了降序,这与你的其余问题相矛盾):

SELECT generate_series('2012-01-01'::date, now()::date, '1d')::date
Run Code Online (Sandbox Code Playgroud)

该类型在输入时自动date强制转换timestamptz.返回类型是timestamptz两种方式.我在下面使用子查询,所以我可以立即转换为输出date.

接下来,max()随着窗口函数准确返回您需要的内容:自开始忽略NULL值的最高值.在此基础上,您将获得一个非常简单的查询.

对于给定的widget_id

最有可能比涉及CROSS JOIN或更快WITH RECURSIVE:

SELECT a.day, s.*
FROM  (
   SELECT d.day
         ,max(s.for_date) OVER (ORDER BY d.day) AS effective_date
   FROM  (
      SELECT generate_series('2012-01-01'::date, now()::date, '1d')::date
      ) d(day)
   LEFT   JOIN score s ON s.for_date = d.day
                      AND s.widget_id = 1337 -- "for a given widget_id"
   ) a
LEFT   JOIN score s ON s.for_date = a.effective_date
                   AND s.widget_id = 1337
ORDER  BY a.day;
Run Code Online (Sandbox Code Playgroud)

- > sqlfiddle

使用此查询,您可以将score您喜欢的任何列放入最终SELECT列表中.我把s.*简单化了.选择你的专栏.

如果你想开始与实际的第一天,你的输出一个分数,只需更换最后LEFT JOINJOIN.

所有widget_id的通用表单

在这里,我使用a CROSS JOIN为每个日期的每个小部件生成一行..

SELECT a.day, a.widget_id, s.score
FROM  (
   SELECT d.day, w.widget_id
         ,max(s.for_date) OVER (PARTITION BY w.widget_id
                                ORDER BY d.day) AS effective_date
   FROM  (SELECT generate_series('2012-05-05'::date
                                ,'2012-05-15'::date, '1d')::date AS day) d
   CROSS  JOIN (SELECT DISTINCT widget_id FROM score) AS w
   LEFT   JOIN score s ON s.for_date = d.day AND s.widget_id = w.widget_id
   ) a
JOIN  score s ON s.for_date = a.effective_date
             AND s.widget_id = a.widget_id  -- instead of LEFT JOIN
ORDER BY a.day, a.widget_id;
Run Code Online (Sandbox Code Playgroud)

- > sqlfiddle