Gab*_*iMe 35 sql postgresql datetime aggregate-functions window-functions
我需要查询每分钟直到该分钟的总行数.
到目前为止我能达到的最好成绩并没有成功.它返回每分钟的计数,而不是每分钟的总计数:
SELECT COUNT(id) AS count
, EXTRACT(hour from "when") AS hour
, EXTRACT(minute from "when") AS minute
FROM mytable
GROUP BY hour, minute
Run Code Online (Sandbox Code Playgroud)
Erw*_*ter 87
SELECT DISTINCT
date_trunc('minute', "when") AS minute
, count(*) OVER (ORDER BY date_trunc('minute', "when")) AS running_ct
FROM mytable
ORDER BY 1;
Run Code Online (Sandbox Code Playgroud)
使用date_trunc(),它会准确返回您需要的内容.
不要包含id在查询中,因为您需要GROUP BY分片.
count()通常用作普通聚合函数.附加一个OVER子句使它成为一个窗口函数.PARTITION BY在窗口定义中省略- 您希望在所有行上运行计数.默认情况下,它从当前行的第一行到最后一个对等计数ORDER BY.我引用手册:
默认框架选项是
RANGE UNBOUNDED PRECEDING,与RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.相同.使用ORDER BY,这将帧设置为从分区启动到当前行的最后一个ORDER BY对等体的所有行.
这恰好正是您所需要的.
使用count(*)而不是count(id).它更适合您的问题("行数").它通常略快于count(id).并且,虽然我们可以假设它id是NOT NULL,但问题中没有指定,严格来说count(id)也是错误的,因为NULL值不计算在内count(id).
您不能GROUP BY在同一查询级别分钟切片.在窗口函数之前应用聚合函数,窗函数count(*)每分钟只能看到1行.
你可以,但是,SELECT DISTINCT由于DISTINCT应用后,窗口功能.
ORDER BY 1这只是简写ORDER BY date_trunc('minute', "when").
1是SELECT列表中第一个表达式的位置参考引用.
使用to_char(),如果你需要格式化的结果.喜欢:
SELECT DISTINCT
to_char(date_trunc('minute', "when"), 'DD.MM.YYYY HH24:MI') AS minute
, count(*) OVER (ORDER BY date_trunc('minute', "when")) AS running_ct
FROM mytable
ORDER BY date_trunc('minute', "when");
Run Code Online (Sandbox Code Playgroud)
SELECT minute, sum(minute_ct) OVER (ORDER BY minute) AS running_ct
FROM (
SELECT date_trunc('minute', "when") AS minute
, count(*) AS minute_ct
FROM tbl
GROUP BY 1
) sub
ORDER BY 1;
Run Code Online (Sandbox Code Playgroud)
与上述非常相似,但是:
我使用子查询来聚合和计算每分钟的行数.这样我们每分钟可以获得1行而不需要DISTINCT在外部SELECT.
用sum()现在的窗口集合函数从子查询加起来计数.
我发现每分钟有很多行,速度要快得多.
@GabiMe问评论如何获得EONE一行每次 minute的时间框架,包括那些没有事件发生时(在基表没有行):
SELECT DISTINCT
minute, count(c.minute) OVER (ORDER BY minute) AS running_ct
FROM (
SELECT generate_series(date_trunc('minute', min("when"))
, max("when")
, interval '1 min')
FROM tbl
) m(minute)
LEFT JOIN (SELECT date_trunc('minute', "when") FROM tbl) c(minute) USING (minute)
ORDER BY 1;
Run Code Online (Sandbox Code Playgroud)
在第一个和最后一个事件之间的时间范围内为每分钟生成一行generate_series()- 这里直接基于子查询的聚合值.
LEFT JOIN将所有时间戳截断为分钟和计数.NULL值(不存在行)不会添加到运行计数.
有了CTE:
WITH cte AS (
SELECT date_trunc('minute', "when") AS minute, count(*) AS minute_ct
FROM tbl
GROUP BY 1
)
SELECT m.minute
, COALESCE(sum(cte.minute_ct) OVER (ORDER BY m.minute), 0) AS running_ct
FROM (
SELECT generate_series(min(minute), max(minute), interval '1 min')
FROM cte
) m(minute)
LEFT JOIN cte USING (minute)
ORDER BY 1;
Run Code Online (Sandbox Code Playgroud)
同样,在第一步中每分钟聚合和计数行,它省略了以后的需要DISTINCT.
不同于count(),sum()可以退货NULL.默认为0使用COALESCE.
在使用Postgres 9.1 - 9.4进行测试的几个变体中,有许多行和带有子查询的索引"when"是最快的.
SELECT m.minute
, COALESCE(sum(c.minute_ct) OVER (ORDER BY m.minute), 0) AS running_ct
FROM (
SELECT generate_series(date_trunc('minute', min("when"))
, max("when")
, interval '1 min')
FROM tbl
) m(minute)
LEFT JOIN (
SELECT date_trunc('minute', "when") AS minute
, count(*) AS minute_ct
FROM tbl
GROUP BY 1
) c USING (minute)
ORDER BY 1;
Run Code Online (Sandbox Code Playgroud)