我的PG数据库中有一个表,看起来有点像这样:
id | widget_id | for_date | score |
Run Code Online (Sandbox Code Playgroud)
每个引用的小部件都有很多这些项目.每个小部件每天总是1个,但是存在差距.
我想得到的结果是包含自X以来每个日期的所有小部件.日期通过生成系列引入:
SELECT date.date::date
FROM generate_series('2012-01-01'::timestamp with time zone,'now'::text::date::timestamp with time zone, '1 day') date(date)
ORDER BY date.date DESC;
Run Code Online (Sandbox Code Playgroud)
如果没有给定widget_id的日期条目,我想使用前一个.所以说小工具1337在2012-05-10没有条目,但在2012-05-08,那么我希望结果集在2012-05-10也显示2012-05-08条目:
Actual data:
widget_id | for_date | score
1312 | 2012-05-07 | 20
1337 | 2012-05-07 | 12
1337 | 2012-05-08 | 41
1337 | 2012-05-11 | 500
Desired output based on generate series:
widget_id | for_date | score
1336 | 2012-05-07 | 20
1337 | 2012-05-07 | 12
1336 | 2012-05-08 | 20
1337 | 2012-05-08 | 41
1336 | 2012-05-09 | 20
1337 | 2012-05-09 | 41
1336 | 2012-05-10 | 20
1337 | 2012-05-10 | 41
1336 | 2012-05-11 | 20
1337 | 2012-05-11 | 500
Run Code Online (Sandbox Code Playgroud)
最后我想把它归结为一个视图,所以我每天都有一致的数据集,我可以轻松查询.
编辑:使样本数据和预期结果集更清晰
select
widget_id,
for_date,
case
when score is not null then score
else first_value(score) over (partition by widget_id, c order by for_date)
end score
from (
select
a.widget_id,
a.for_date,
s.score,
count(score) over(partition by a.widget_id order by a.for_date) c
from (
select widget_id, g.d::date for_date
from (
select distinct widget_id
from score
) s
cross join
generate_series(
(select min(for_date) from score),
(select max(for_date) from score),
'1 day'
) g(d)
) a
left join
score s on a.widget_id = s.widget_id and a.for_date = s.for_date
) s
order by widget_id, for_date
Run Code Online (Sandbox Code Playgroud)
首先,您可以使用更简单的generate_series()
表表达式.相当于你的(除了降序,这与你的其余问题相矛盾):
SELECT generate_series('2012-01-01'::date, now()::date, '1d')::date
Run Code Online (Sandbox Code Playgroud)
该类型在输入时自动date
强制转换timestamptz
.返回类型是timestamptz
两种方式.我在下面使用子查询,所以我可以立即转换为输出date
.
接下来,max()
随着窗口函数准确返回您需要的内容:自帧开始忽略NULL
值的最高值.在此基础上,您将获得一个非常简单的查询.
最有可能比涉及CROSS JOIN
或更快WITH RECURSIVE
:
SELECT a.day, s.*
FROM (
SELECT d.day
,max(s.for_date) OVER (ORDER BY d.day) AS effective_date
FROM (
SELECT generate_series('2012-01-01'::date, now()::date, '1d')::date
) d(day)
LEFT JOIN score s ON s.for_date = d.day
AND s.widget_id = 1337 -- "for a given widget_id"
) a
LEFT JOIN score s ON s.for_date = a.effective_date
AND s.widget_id = 1337
ORDER BY a.day;
Run Code Online (Sandbox Code Playgroud)
使用此查询,您可以将score
您喜欢的任何列放入最终SELECT
列表中.我把s.*简单化了.选择你的专栏.
如果你想开始与实际的第一天,你的输出有一个分数,只需更换最后LEFT JOIN
用JOIN
.
在这里,我使用a CROSS JOIN
为每个日期的每个小部件生成一行..
SELECT a.day, a.widget_id, s.score
FROM (
SELECT d.day, w.widget_id
,max(s.for_date) OVER (PARTITION BY w.widget_id
ORDER BY d.day) AS effective_date
FROM (SELECT generate_series('2012-05-05'::date
,'2012-05-15'::date, '1d')::date AS day) d
CROSS JOIN (SELECT DISTINCT widget_id FROM score) AS w
LEFT JOIN score s ON s.for_date = d.day AND s.widget_id = w.widget_id
) a
JOIN score s ON s.for_date = a.effective_date
AND s.widget_id = a.widget_id -- instead of LEFT JOIN
ORDER BY a.day, a.widget_id;
Run Code Online (Sandbox Code Playgroud)