use*_*798 39 postgresql group-by
我将测量数据存储到以下结构中:
CREATE TABLE measurements(
measured_at TIMESTAMPTZ,
val INTEGER
);
Run Code Online (Sandbox Code Playgroud)
我已经知道使用了
(一个) date_trunc('hour',measured_at)
和
(b)中 generate_series
我可以通过以下方式汇总我的数据:
microseconds,
milliseconds
.
.
.
Run Code Online (Sandbox Code Playgroud)
但是有可能将数据聚合5分钟,或者说是任意秒数吗?是否可以将测量数据聚合为任意秒数?
我需要通过不同时间分辨率聚合的数据将它们馈送到FFT或AR模型中,以便查看可能的季节性.
Mik*_*ll' 46
您可以通过添加generate_series()创建的间隔来生成"桶"表.此SQL语句将为min(measured_at)
数据中的第一天(值)生成一个包含五分钟存储桶的表.
select
(select min(measured_at)::date from measurements) + ( n || ' minutes')::interval start_time,
(select min(measured_at)::date from measurements) + ((n+5) || ' minutes')::interval end_time
from generate_series(0, (24*60), 5) n
Run Code Online (Sandbox Code Playgroud)
将该语句包装在公用表表达式中,您可以将其加入并分组,就像它是基表一样.
with five_min_intervals as (
select
(select min(measured_at)::date from measurements) + ( n || ' minutes')::interval start_time,
(select min(measured_at)::date from measurements) + ((n+5) || ' minutes')::interval end_time
from generate_series(0, (24*60), 5) n
)
select f.start_time, f.end_time, avg(m.val) avg_val
from measurements m
right join five_min_intervals f
on m.measured_at >= f.start_time and m.measured_at < f.end_time
group by f.start_time, f.end_time
order by f.start_time
Run Code Online (Sandbox Code Playgroud)
按任意秒数分组是相似的 - 使用date_trunc()
.
更一般地使用generate_series()可以避免猜测五分钟存储桶的上限.在实践中,您可能将其构建为视图或函数.您可以从基表获得更好的性能.
select
(select min(measured_at)::date from measurements) + ( n || ' minutes')::interval start_time,
(select min(measured_at)::date from measurements) + ((n+5) || ' minutes')::interval end_time
from generate_series(0, ((select max(measured_at)::date - min(measured_at)::date from measurements) + 1)*24*60, 5) n;
Run Code Online (Sandbox Code Playgroud)
Jul*_*ian 12
Catcall有一个很好的答案.我使用它的例子演示了固定存储桶 - 在这种情况下,从午夜开始每隔30分钟.它还表明在Catcall的第一个版本中可以生成一个额外的存储桶以及如何消除它.我一天只需要48个桶.在我的问题中,观察具有单独的日期和时间列,并且我希望在一个月内的30分钟内对许多不同服务的观察结果进行平均.
with intervals as (
select
(n||' minutes')::interval as start_time,
((n+30)|| ' minutes')::interval as end_time
from generate_series(0, (23*60+30), 30) n
)
select i.start_time, o.service, avg(o.o)
from
observations o right join intervals i
on o.time >= i.start_time and o.time < i.end_time
where o.date between '2013-01-01' and '2013-01-31'
group by i.start_time, i.end_time, o.service
order by i.start_time
Run Code Online (Sandbox Code Playgroud)
Lau*_*lbe 12
从 PostgreSQL v14 开始,您可以使用date_bin
该函数:
SELECT date_bin(
INTERVAL '5 minutes',
measured_at,
TIMESTAMPTZ '2000-01-01'
),
sum(val)
FROM measurements
GROUP BY 1;
Run Code Online (Sandbox Code Playgroud)
gri*_*sha 10
怎么样
SELECT MIN(val),
EXTRACT(epoch FROM measured_at) / EXTRACT(epoch FROM INTERVAL '5 min') AS int
FROM measurements
GROUP BY int
Run Code Online (Sandbox Code Playgroud)
其中'5分钟'可以是INTERVAL支持的任何表达式
以下内容将为您提供任何尺寸的水桶,即使它们没有很好的分钟/小时/任何边界.值"300"用于5分钟分组,但任何值都可以替换:
select measured_at,
val,
(date_trunc('seconds', (measured_at - timestamptz 'epoch') / 300) * 300 + timestamptz 'epoch') as aligned_measured_at
from measurements;
Run Code Online (Sandbox Code Playgroud)
然后,您可以使用"val"周围所需的任何聚合,并根据需要使用"group by aligned_measured_at".
这是基于Mike Sherrill的答案,除了它使用时间戳记间隔而不是单独的开始/结束列。
with intervals as (
select tstzrange(s, s + '5 minutes') das_interval
from (select generate_series(min(lower(time_range)), max(upper(time_rage)), '5 minutes') s
from your_table) x)
select das_interval, your_table.*
from your_table
right join intervals on time_range && das_interval
order by das_interval;
Run Code Online (Sandbox Code Playgroud)
我想查看过去24小时的数据并以小时为单位计数。我开始使用Cat Recall的解决方案,它非常漂亮。但是,它与数据绑定,而不仅仅是过去24小时内发生的事情。因此,我进行了重构,最终得到了与朱利安(Julian)解决方案非常接近的东西,但具有更多的CTE。所以这是两个答案的结合。
WITH interval_query AS (
SELECT (ts ||' hour')::INTERVAL AS hour_interval
FROM generate_series(0,23) AS ts
), time_series AS (
SELECT date_trunc('hour', now()) + INTERVAL '60 min' * ROUND(date_part('minute', now()) / 60.0) - interval_query.hour_interval AS start_time
FROM interval_query
), time_intervals AS (
SELECT start_time, start_time + '1 hour'::INTERVAL AS end_time
FROM time_series ORDER BY start_time
), reading_counts AS (
SELECT f.start_time, f.end_time, br.minor, count(br.id) readings
FROM beacon_readings br
RIGHT JOIN time_intervals f
ON br.reading_timestamp >= f.start_time AND br.reading_timestamp < f.end_time AND br.major = 4
GROUP BY f.start_time, f.end_time, br.minor
ORDER BY f.start_time, br.minor
)
SELECT * FROM reading_counts
Run Code Online (Sandbox Code Playgroud)
请注意,我在最终查询中想要的所有其他限制都需要在中完成RIGHT JOIN
。我并不是说这一定是最好的(甚至是一种好的方法),但这是我正在(至少目前)在仪表板中运行的东西。
归档时间: |
|
查看次数: |
19191 次 |
最近记录: |