JV.*_*JV. 13 python database postgresql left-join generate-series
我有不均匀分布的数据(wrt日期)几年(2003-2008).我想查询一组给定的开始和结束日期的数据,按PostgreSQL 8.3中任何支持的时间间隔(日,周,月,季,年)对数据进行分组(http://www.postgresql.org/docs /8.3/static/functions-datetime.html#FUNCTIONS-DATETIME-TRUNC).
问题是某些查询会在所需的时间段内提供连续的结果,如下所示:
select to_char(date_trunc('month',date), 'YYYY-MM-DD'),count(distinct post_id)
from some_table where category_id=1 and entity_id = 77 and entity2_id = 115
and date <= '2008-12-06' and date >= '2007-12-01' group by
date_trunc('month',date) order by date_trunc('month',date);
to_char | count
------------+-------
2007-12-01 | 64
2008-01-01 | 31
2008-02-01 | 14
2008-03-01 | 21
2008-04-01 | 28
2008-05-01 | 44
2008-06-01 | 100
2008-07-01 | 72
2008-08-01 | 91
2008-09-01 | 92
2008-10-01 | 79
2008-11-01 | 65
(12 rows)
Run Code Online (Sandbox Code Playgroud)
但有些人因为没有数据而错过了一些间隔,因为这个:
select to_char(date_trunc('month',date), 'YYYY-MM-DD'),count(distinct post_id)
from some_table where category_id=1 and entity_id = 75 and entity2_id = 115
and date <= '2008-12-06' and date >= '2007-12-01' group by
date_trunc('month',date) order by date_trunc('month',date);
to_char | count
------------+-------
2007-12-01 | 2
2008-01-01 | 2
2008-03-01 | 1
2008-04-01 | 2
2008-06-01 | 1
2008-08-01 | 3
2008-10-01 | 2
(7 rows)
Run Code Online (Sandbox Code Playgroud)
所需结果集的位置是:
to_char | count
------------+-------
2007-12-01 | 2
2008-01-01 | 2
2008-02-01 | 0
2008-03-01 | 1
2008-04-01 | 2
2008-05-01 | 0
2008-06-01 | 1
2008-07-01 | 0
2008-08-01 | 3
2008-09-01 | 0
2008-10-01 | 2
2008-11-01 | 0
(12 rows)
Run Code Online (Sandbox Code Playgroud)
缺少条目的计数为0.
我之前已经看过Stack Overflow的讨论,但它们似乎并没有解决我的问题,因为我的分组时间是(日,周,月,季,年)之一,并由应用程序决定运行时.因此,左边连接与日历表或序列表的方法对我来说无济于事.
我目前的解决方案是使用日历模块在Python(在Turbogears应用程序中)填补这些空白.
有一个更好的方法吗.
Erw*_*ter 22
这个问题很老了.但是,由于其他用户选择它作为新副本的主人,我正在添加一个正确的答案.
SELECT *
FROM (
SELECT day::date
FROM generate_series(timestamp '2007-12-01'
, timestamp '2008-12-01'
, interval '1 month') day
) d
LEFT JOIN (
SELECT date_trunc('month', date_col)::date AS day
, count(*) AS some_count
FROM tbl
WHERE date_col >= date '2007-12-01'
AND date_col <= date '2008-12-06'
-- AND ... more conditions
GROUP BY 1
) t USING (day)
ORDER BY day;
Run Code Online (Sandbox Code Playgroud)
LEFT JOIN
当然可以使用.
generate_series()
可以快速生成一张时间戳表,速度非常快.
在加入之前聚合通常会更快.我最近在sqlfiddle.com上提供了一个相关答案的测试用例:
将timestamp
to date
(::date
)转换为基本格式.更多用途to_char()
.
GROUP BY 1
是语法简写,用于引用第一个输出列.也可以GROUP BY day
,但这可能与同名的现有列冲突.或者GROUP BY date_trunc('month', date_col)::date
这对我来说太长了.
使用可用的interval参数date_trunc()
.
count()
永远不会产生NULL
(0
没有行),但LEFT JOIN
确实如此.
要返回0
而不是NULL
在外部SELECT
,请使用COALESCE(some_count, 0) AS some_count
.手册.
对于更通用的解决方案或任意时间间隔,请考虑以下相关的答案:
Mar*_*wis 17
您可以创建去年(例如)的所有前几天的列表
select distinct date_trunc('month', (current_date - offs)) as date
from generate_series(0,365,28) as offs;
date
------------------------
2007-12-01 00:00:00+01
2008-01-01 00:00:00+01
2008-02-01 00:00:00+01
2008-03-01 00:00:00+01
2008-04-01 00:00:00+02
2008-05-01 00:00:00+02
2008-06-01 00:00:00+02
2008-07-01 00:00:00+02
2008-08-01 00:00:00+02
2008-09-01 00:00:00+02
2008-10-01 00:00:00+02
2008-11-01 00:00:00+01
2008-12-01 00:00:00+01
Run Code Online (Sandbox Code Playgroud)
然后你可以加入那个系列.