Fey*_*eyd 17 postgresql performance index greatest-n-per-group
鉴于表:
Column | Type
id | integer
latitude | numeric(9,6)
longitude | numeric(9,6)
speed | integer
equipment_id | integer
created_at | timestamp without time zone
Indexes:
"geoposition_records_pkey" PRIMARY KEY, btree (id)
Run Code Online (Sandbox Code Playgroud)
该表有 2000 万条记录,相对而言,这不是一个大数目。但它会使顺序扫描变慢。
我怎样才能获得max(created_at)
每个的最后一条记录 ( ) equipment_id
?
我已经尝试了以下两个查询,其中有几个变体,我已经阅读了本主题的许多答案:
select max(created_at),equipment_id from geoposition_records group by equipment_id;
select distinct on (equipment_id) equipment_id,created_at
from geoposition_records order by equipment_id, created_at desc;
Run Code Online (Sandbox Code Playgroud)
我也尝试过创建 btree 索引,equipment_id,created_at
但 Postgres 发现使用 seqscan 更快。强制enable_seqscan = off
也没有用,因为读取索引与 seq 扫描一样慢,可能更糟。
查询必须定期运行,始终返回最后一个。
使用 Postgres 9.3。
解释/分析(有 170 万条记录):
set enable_seqscan=true;
explain analyze select max(created_at),equipment_id from geoposition_records group by equipment_id;
"HashAggregate (cost=47803.77..47804.34 rows=57 width=12) (actual time=1935.536..1935.556 rows=58 loops=1)"
" -> Seq Scan on geoposition_records (cost=0.00..39544.51 rows=1651851 width=12) (actual time=0.029..494.296 rows=1651851 loops=1)"
"Total runtime: 1935.632 ms"
set enable_seqscan=false;
explain analyze select max(created_at),equipment_id from geoposition_records group by equipment_id;
"GroupAggregate (cost=0.00..2995933.57 rows=57 width=12) (actual time=222.034..11305.073 rows=58 loops=1)"
" -> Index Scan using geoposition_records_equipment_id_created_at_idx on geoposition_records (cost=0.00..2987673.75 rows=1651851 width=12) (actual time=0.062..10248.703 rows=1651851 loops=1)"
"Total runtime: 11305.161 ms"
Run Code Online (Sandbox Code Playgroud)
Erw*_*ter 11
毕竟,一个普通的多列 B 树索引应该可以工作:
CREATE INDEX foo_idx
ON geoposition_records (equipment_id, created_at DESC NULLS LAST);
Run Code Online (Sandbox Code Playgroud)
为什么DESC NULLS LAST
?
假设你有一张equipment
桌子是安全的吗?那么性能就不会成为问题:
基于这个equipment
表,运行一个低相关的子查询,效果很好:
SELECT equipment_id
, (SELECT created_at
FROM geoposition_records
WHERE equipment_id = eq.equipment_id
ORDER BY created_at DESC NULLS LAST
LIMIT 1) AS latest
FROM equipment eq;
Run Code Online (Sandbox Code Playgroud)
对于表中的少量行equipment
(从您的EXPLAIN ANALYZE
输出判断为 57 ),这非常快。
LATERAL
加入 Postgres 9.3+SELECT eq.equipment_id, r.latest
FROM equipment eq
LEFT JOIN LATERAL (
SELECT created_at
FROM geoposition_records
WHERE equipment_id = eq.equipment_id
ORDER BY created_at DESC NULLS LAST
LIMIT 1
) r(latest) ON true;
Run Code Online (Sandbox Code Playgroud)
详细解释:
性能类似于相关子查询。
如果您无法向查询计划器(这不应该发生)讲道理,那么循环遍历设备表的函数肯定可以解决问题。一次查找一个equipment_id
使用索引。
CREATE OR REPLACE FUNCTION f_latest_equip()
RETURNS TABLE (equipment_id int, latest timestamp)
LANGUAGE plpgsql STABLE AS
$func$
BEGIN
FOR equipment_id IN
SELECT e.equipment_id FROM equipment e ORDER BY 1
LOOP
SELECT g.created_at
FROM geoposition_records g
WHERE g.equipment_id = f_latest_equip.equipment_id
-- prepend function name to disambiguate
ORDER BY g.created_at DESC NULLS LAST
LIMIT 1
INTO latest;
RETURN NEXT;
END LOOP;
END
$func$;
Run Code Online (Sandbox Code Playgroud)
也是一个很好的通话:
SELECT * FROM f_latest_equip();
Run Code Online (Sandbox Code Playgroud)
尝试1
如果
equipment
桌子,并且geoposition_records(equipment_id, created_at desc)
那么以下内容对我有用:
select id as equipment_id, (select max(created_at)
from geoposition_records
where equipment_id = equipment.id
) as max_created_at
from equipment;
Run Code Online (Sandbox Code Playgroud)
我无法强制 PG 进行快速查询来确定s 列表和equipment_id
相关max(created_at)
. 但明天我要再试一次!
尝试2
我找到了这个链接:http://zogovic.com/post/44856908222/optimizing-postgresql-query-for-distinct-values 将此技术与尝试 1 中的查询结合起来,我得到:
WITH RECURSIVE equipment(id) AS (
SELECT MIN(equipment_id) FROM geoposition_records
UNION
SELECT (
SELECT equipment_id
FROM geoposition_records
WHERE equipment_id > equipment.id
ORDER BY equipment_id
LIMIT 1
)
FROM equipment WHERE id IS NOT NULL
)
SELECT id AS equipment_id, (SELECT MAX(created_at)
FROM geoposition_records
WHERE equipment_id = equipment.id
) AS max_created_at
FROM equipment;
Run Code Online (Sandbox Code Playgroud)
而且效果很快!但你需要
geoposition_records(equipment_id, created_at desc)
。 归档时间: |
|
查看次数: |
30787 次 |
最近记录: |