Jam*_*der 5 postgresql histogram window-functions
是的,我在 PostgreSQL 中有一个这样的表:
timestamp duration
2013-04-03 15:44:58 4
2013-04-03 15:56:12 2
2013-04-03 16:13:17 9
2013-04-03 16:16:30 3
2013-04-03 16:29:52 1
2013-04-03 16:38:25 1
2013-04-03 16:41:37 9
2013-04-03 16:44:49 1
2013-04-03 17:01:07 9
2013-04-03 17:07:48 1
2013-04-03 17:11:00 2
2013-04-03 17:11:16 2
2013-04-03 17:15:17 1
2013-04-03 17:16:53 4
2013-04-03 17:20:37 9
2013-04-03 17:20:53 3
2013-04-03 17:25:48 3
2013-04-03 17:29:26 1
2013-04-03 17:32:38 9
2013-04-03 17:36:55 4
Run Code Online (Sandbox Code Playgroud)
我想得到以下输出:
时间戳窗口开始 = 2013-04-03 15:44:58
duration count
1 0
2 1
3 0
4 1
9 0
Run Code Online (Sandbox Code Playgroud)
时间戳窗口开始 = 2013-04-03 15:59:58
duration count
1 0
2 0
3 0
4 0
9 1
Run Code Online (Sandbox Code Playgroud)
时间戳窗口开始 = 2013-04-03 16:14:58
duration count
1 1
2 0
3 1
4 0
9 0
Run Code Online (Sandbox Code Playgroud)
时间戳窗口开始 = 2013-04-03 16:29:58
duration count
1 2
2 0
3 0
4 0
9 1
Run Code Online (Sandbox Code Playgroud)
ETC...
因此基本上它会在 15 分钟窗口中循环遍历时间戳,并输出不同的持续时间值及其频率(计数)。timestampwindowstart值是窗口的最早时间戳(即timestampwindowfinish = timestampwindowstart + 15分钟)
这样我就可以绘制 15 分钟间隔直方图......
我尝试过阅读,但对我来说有点复杂,而且我没有太多时间......
谢谢你的帮助!
快速而肮脏的方式:http://sqlfiddle.com/#!1/ bd2f6/21 我命名了我的专栏tstamp而不是你的timestamp
with t as (
select
generate_series(mitstamp,matstamp,'15 minutes') as int,
duration
from
(select min(tstamp) mitstamp, max(tstamp) as matstamp from tmp) a,
(select duration from tmp group by duration) b
)
select
int as timestampwindowstart,
t.duration,
count(tmp.duration)
from
t
left join tmp on
(tmp.tstamp >= t.int and
tmp.tstamp < (t.int + interval '15 minutes') and
t.duration = tmp.duration)
group by
int,
t.duration
order by
int,
t.duration
Run Code Online (Sandbox Code Playgroud)
简要说明:
null给定的时间间隔内将存在不存在持续时间的情况。count(null)=0如果您有更多表,并且算法应该应用于它们的并集。假设我们有三个表,tmp1, tmp2, tmp3全部包含列tstamp和duration。我们可以扩展之前的解决方案:
with
tmpout as (
select * from tmp1 union all
select * from tmp2 union all
select * from tmp3
)
,t as (
select
generate_series(mitstamp,matstamp,'15 minutes') as int,
duration
from
(select min(tstamp) mitstamp, max(tstamp) as matstamp from tmpout) a,
(select duration from tmpout group by duration) b
)
select
int as timestampwindowstart,
t.duration,
count(tmp.duration)
from
t
left join tmpout on
(tmp.tstamp >= t.int and
tmp.tstamp < (t.int + interval '15 minutes') and
t.duration = tmp.duration)
group by
int,
t.duration
order by
int,
t.duration
Run Code Online (Sandbox Code Playgroud)
你应该真正了解withPostgreSQL 中的子句。对于 PostgreSQL 中的任何数据分析来说,这是非常宝贵的概念。
| 归档时间: |
|
| 查看次数: |
4281 次 |
| 最近记录: |