Dmi*_*tro 4 postgresql index range-types query-performance postgresql-performance
我有events包含字段的表:
id
user_id
time_start
time_end
...
Run Code Online (Sandbox Code Playgroud)
并在 上有 B 树索引(time_start, time_end)。
SELECT user_id
FROM events
WHERE ((time_start <= '2021-08-24T15:30:00+00:00' AND time_end >= '2021-08-24T15:30:00+00:00') OR
(time_start <= '2021-08-24T15:59:00+00:00' AND time_end >= '2021-08-24T15:59:00+00:00'))
GROUP BY user_id);
Run Code Online (Sandbox Code Playgroud)
Group (cost=243735.42..243998.32 rows=1103 width=4) (actual time=186.533..188.244 rows=166 loops=1)
Group Key: user_id
Buffers: shared hit=224848
-> Gather Merge (cost=243735.42..243992.80 rows=2206 width=4) (actual time=186.532..188.199 rows=176 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=224848
-> Sort (cost=242735.39..242738.15 rows=1103 width=4) (actual time=184.121..184.126 rows=59 loops=3)
Sort Key: user_id
Sort Method: quicksort Memory: 27kB
Worker 0: Sort Method: quicksort Memory: 27kB
Worker 1: Sort Method: quicksort Memory: 28kB
Buffers: shared hit=224848
-> Partial HashAggregate (cost=242668.62..242679.65 rows=1103 width=4) (actual time=184.065..184.085 rows=59 loops=3)
Group Key: user_id
Buffers: shared hit=224834
-> Parallel Seq Scan on events (cost=0.00..242553.74 rows=45952 width=4) (actual time=104.085..183.994 rows=64 loops=3)
Filter: (((time_start <= '2021-08-24 15:30:00+00'::timestamp with time zone) AND (time_end >= '2021-08-24 15:30:00+00'::timestamp with time zone)) OR ((time_start <= '2021-08-24 15:59:00+00'::timestamp with time zone) AND (time_end >= '2021-08-24 15:59:00+00'::timestamp with time zone)))
Rows Removed by Filter: 708728
Buffers: shared hit=224834
Planning Time: 0.169 ms
Execution Time: 188.294 ms
Run Code Online (Sandbox Code Playgroud)
Postgres 与过滤器一起使用Seq Scan:
Filter: (((time_start <= '2021-08-24 15:30:00+00'::timestamp with time zone) AND (time_end >= '2021-08-24 15:30:00+00'::timestamp with time zone)) OR ((time_start <= '2021-08-24 15:59:00+00'::timestamp with time zone) AND (time_end >= '2021-08-24 15:59:00+00'::timestamp with time zone)))
Run Code Online (Sandbox Code Playgroud)
但是当我留下一个条件时time_start,time_end它就开始使用索引扫描。
如何更改条件以使 Postgres 使用索引扫描而不是顺序扫描?
我不想使用UNION像:
SELECT user_id
FROM events
WHERE (
(time_start <= '2021-08-24T15:59:00+00:00' AND time_end >= '2021-08-24T15:59:00+00:00'))
GROUP BY user_id)
UNION (SELECT user_id
FROM events
WHERE (
(time_start <= '2021-08-24T15:59:00+00:00' AND time_end >= '2021-08-24T15:59:00+00:00'))
GROUP BY user_id
Run Code Online (Sandbox Code Playgroud)
包含时间戳范围的GiST或(甚至更好)SP-GiST表达式索引应该会产生奇迹。
CREATE INDEX events_right_idx ON events USING spgist (tsrange(time_start, time_end, '[]'));
Run Code Online (Sandbox Code Playgroud)
使用“范围包含”运算符重写您的查询并匹配索引表达式(与原始表达式完全相同):@>
SELECT user_id
FROM events
WHERE tsrange(time_start, time_end, '[]') @> timestamp '2021-08-24 15:30:00'
OR tsrange(time_start, time_end, '[]') @> timestamp '2021-08-24 15:59:00'
GROUP BY user_id;
Run Code Online (Sandbox Code Playgroud)
您将得到如下查询计划:
CREATE INDEX events_right_idx ON events USING spgist (tsrange(time_start, time_end, '[]'));
Run Code Online (Sandbox Code Playgroud)
应该会快很多。
除非另有说明,范围类型假定包含下限和排除上限。tsrange(time_start, time_end)是相同的tsrange(time_start, time_end), '[)')。
由于您使用>=和进行操作<=,因此请使用 来包含两个边界tsrange(time_start, time_end, '[]')。
有关的:
不过,作为普通(非表达式)索引,应该会快一点。
您可以将时间戳范围列添加到表中,例如:
ALTER TABLE event ADD COLUMN ts_range tsrange GENERATED ALWAYE AS (tsrange(time_start, time_end, '[]')) STORED;
Run Code Online (Sandbox Code Playgroud)
看:
或者,更彻底地,将time_start和替换time_end为范围列。那么索引和查询就简单了一些:
CREATE INDEX events_right_idx ON events USING spgist (ts_range);
SELECT user_id
FROM events
WHERE ts_range @> timestamp '2021-08-24T15:30:00'
OR ts_range @> timestamp '2021-08-24T15:59:00'
GROUP BY user_id;
Run Code Online (Sandbox Code Playgroud)
但一tsrange列比两列占用的空间更大timestamp。权衡成本和收益。
Postgres 14(当前测试版)甚至允许覆盖 SP-GiST 索引。发行说明:
允许 SP-GiST 使用 INCLUDE'd 列 (Pavel Borisov)
但我不认为您可以获得特定查询的仅索引扫描。
如果由于某种原因你不得不使用 B 树索引,那么这个固定UNION查询应该不会太糟糕:
SELECT user_id
FROM events
WHERE '2021-08-24T15:30:00' BETWEEN time_start AND time_end
UNION
SELECT user_id
FROM events
WHERE '2021-08-24T15:59:00' BETWEEN time_start AND time_end
Run Code Online (Sandbox Code Playgroud)
值得注意的是,没有GROUP BY。UNION已经完成了所有工作。
并简化BETWEEN(对性能没有影响)。
timestamp without time zone另外,你似乎有和的疯狂组合timestamp with time zone。并将其命名为“时间”以增加混乱。通常timestamptz是更好的选择。看:
最后但并非最不重要的一点是,这表明列统计信息不准确,导致查询计划不理想:
-> 对事件进行并行 Seq 扫描(成本=0.00..242553.74行=45952宽度=4)
(实际时间=104.085..183.994行=64循环=3)
跑步
ANALYZE events;
Run Code Online (Sandbox Code Playgroud)
并重试。您的原始查询可以使用普通的 B 树索引。它只是不如建议的 SP-GiST 索引那么有效。
然后也许可以调整您的autovacuum统计设置,以避免将来出现错误的统计数据。看:
| 归档时间: |
|
| 查看次数: |
328 次 |
| 最近记录: |