max*_*max 2 sql postgresql indexing performance query-optimization
我有以下查询,它为每个查询获取id最新的N :observationsstation
SELECT id
FROM (
SELECT station_id, id, created_at,
row_number() OVER(PARTITION BY station_id
ORDER BY created_at DESC) AS rn
FROM (
SELECT station_id, id, created_at
FROM observations
) s
) s
WHERE rn <= #{n}
ORDER BY station_id, created_at DESC;
Run Code Online (Sandbox Code Playgroud)
我有指标的id,station_id,created_at。
这是我想出的唯一解决方案,每个站可以获取多个记录。但是,它非常慢(81000条记录的表为154.0毫秒)。
如何加快查询速度?
假设至少Postgres 9.3。
首先,多列索引将有助于:
CREATE INDEX observations_special_idx
ON observations(station_id, created_at DESC, id)
Run Code Online (Sandbox Code Playgroud)
created_at DESC的拟合度稍好一些,但是如果不使用,索引仍将以几乎相同的速度向后扫描DESC。
假设created_at已定义NOT NULL,则DESC NULLS LAST在索引和查询中考虑:
最后一列id仅在从中获得仅索引的扫描时才有用,如果不断添加许多新行,则可能无法使用。在这种情况下,请id从索引中删除。
简化查询,内部子选择无济于事:
SELECT id
FROM (
SELECT station_id, id, created_at
, row_number() OVER (PARTITION BY station_id
ORDER BY created_at DESC) AS rn
FROM observations
) s
WHERE rn <= #{n} -- your limit here
ORDER BY station_id, created_at DESC;
Run Code Online (Sandbox Code Playgroud)
应该会快一点,但仍然很慢。
station_idid定义为NOT NULL。为了真正快,您需要等效的宽松索引扫描(尚未在Postgres中实现)。相关答案:
如果您有一个单独的表stations(似乎是这样),则可以使用JOIN LATERAL(Postgres 9.3+)进行模拟:
SELECT o.id
FROM stations s
CROSS JOIN LATERAL (
SELECT o.id
FROM observations o
WHERE o.station_id = s.station_id -- lateral reference
ORDER BY o.created_at DESC
LIMIT #{n} -- your limit here
) o
ORDER BY s.station_id, o.created_at DESC;
Run Code Online (Sandbox Code Playgroud)
如果您没有的表格stations,那么下一个最好的办法就是创建并维护一个表格。可能添加外键引用以增强关系完整性。
如果这不是一个选择,则可以即时提取此类表格。简单的选择是:
SELECT DISTINCT station_id FROM observations;
SELECT station_id FROM observations GROUP BY 1;Run Code Online (Sandbox Code Playgroud)
但是任何一个都需要顺序扫描并且很慢。使Postgres使用station_id具有递归CTE的上述索引(或任何具有前导列的btree索引):
WITH RECURSIVE stations AS (
( -- extra pair of parentheses ...
SELECT station_id
FROM observations
ORDER BY station_id
LIMIT 1
) -- ... is required!
UNION ALL
SELECT (SELECT o.station_id
FROM observations o
WHERE o.station_id > s.station_id
ORDER BY o.station_id
LIMIT 1)
FROM stations s
WHERE s.station_id IS NOT NULL -- serves as break condition
)
SELECT station_id
FROM stations
WHERE station_id IS NOT NULL; -- remove dangling row with NULL
Run Code Online (Sandbox Code Playgroud)
使用它作为简易替换为stations在上述简单的查询表:
WITH RECURSIVE stations AS (
(
SELECT station_id
FROM observations
ORDER BY station_id
LIMIT 1
)
UNION ALL
SELECT (SELECT o.station_id
FROM observations o
WHERE o.station_id > s.station_id
ORDER BY o.station_id
LIMIT 1)
FROM stations s
WHERE s.station_id IS NOT NULL
)
SELECT o.id
FROM stations s
CROSS JOIN LATERAL (
SELECT o.id, o.created_at
FROM observations o
WHERE o.station_id = s.station_id
ORDER BY o.created_at DESC
LIMIT #{n} -- your limit here
) o
WHERE s.station_id IS NOT NULL
ORDER BY s.station_id, o.created_at DESC;
Run Code Online (Sandbox Code Playgroud)
这应该仍然比您的数量级要快。
| 归档时间: |
|
| 查看次数: |
1542 次 |
| 最近记录: |