Dej*_*ell 8 postgresql performance index explain
我有一个包含 165M 记录的表,如下所示:
Performance
id integer
installs integer
hour timestamp without time zone
Run Code Online (Sandbox Code Playgroud)
我也有一个小时索引:
CREATE INDEX hour_idx
ON performance
USING btree
(hour DESC NULLS LAST);
Run Code Online (Sandbox Code Playgroud)
但是,选择按小时排序的前 10 条记录需要 6 分钟!
EXPLAIN ANALYZE select hour from performance order by hour desc limit 10
Run Code Online (Sandbox Code Playgroud)
退货
Limit (cost=7952135.23..7952135.25 rows=10 width=8) (actual time=376313.958..376313.964 rows=10 loops=1)
-> Sort (cost=7952135.23..8368461.00 rows=166530310 width=8) (actual time=376313.957..376313.960 rows=10 loops=1)
Sort Key: hour
Sort Method: top-N heapsort Memory: 25kB
-> Seq Scan on performance (cost=0.00..4353475.10 rows=166530310 width=8) (actual time=0.006..327149.828 rows=192330557 loops=1)
Planning time: 0.070 ms
Execution time: 376330.573 ms
Run Code Online (Sandbox Code Playgroud)
为什么需要这么长时间?如果日期字段 desc 上有索引 - 检索数据不是应该非常快吗?
小智 18
在上面的示例代码中,索引是显式创建的,NULLS LAST并且查询是隐式运行的NULLS FIRST(这是 的默认值ORDER BY .. DESC),因此如果 PostgreSQL 使用了索引,则需要对数据重新排序。因此,索引实际上会使查询比(已经很慢的)表扫描慢很多倍。
rds-9.6.5 root@db1=> create table performance (id integer, installs integer, hour timestamp without time zone);
CREATE TABLE
Time: 28.100 ms
rds-9.6.5 root@db1=> with generator as (select generate_series(1,166530) i)
[more] - > insert into performance (
[more] ( > select
[more] ( > i id,
[more] ( > (random()*1000)::integer installs,
[more] ( > (now() - make_interval(secs => i))::timestamp installs
[more] ( > from generator
[more] ( > );
INSERT 0 166530
Time: 244.872 ms
rds-9.6.5 root@db1=> create index hour_idx
[more] - > on performance
[more] - > using btree
[more] - > (hour desc nulls last);
CREATE INDEX
Time: 67.089 ms
rds-9.6.5 root@db1=> vacuum analyze performance;
VACUUM
Time: 43.552 ms
Run Code Online (Sandbox Code Playgroud)
我们可以WHERE在小时列上添加一个子句,以便使用索引成为一个好主意 - 但请注意我们仍然需要从索引中重新排序数据。
rds-9.6.5 root@db1=> explain select hour from performance where hour>now() order by hour desc limit 10;
QUERY PLAN
---------------------------------------------------------------------------------------------
Limit (cost=4.45..4.46 rows=1 width=8)
-> Sort (cost=4.45..4.46 rows=1 width=8)
Sort Key: hour DESC
-> Index Only Scan using hour_idx on performance (cost=0.42..4.44 rows=1 width=8)
Index Cond: (hour > now())
(5 rows)
Time: 0.789 ms
Run Code Online (Sandbox Code Playgroud)
如果我们NULLS LAST向您的查询添加显式,那么它将按预期使用索引。
rds-9.6.5 root@db1=> explain select hour from performance order by hour desc NULLS LAST limit 10;
QUERY PLAN
-----------------------------------------------------------------------------------------------
Limit (cost=0.42..0.68 rows=10 width=8)
-> Index Only Scan using hour_idx on performance (cost=0.42..4334.37 rows=166530 width=8)
(2 rows)
Time: 0.526 ms
Run Code Online (Sandbox Code Playgroud)
或者,如果我们NULLS LAST从您的索引中删除(非默认),那么查询将按预期使用它而无需修改。
rds-9.6.5 root@db1=> drop index hour_idx;
DROP INDEX
Time: 4.124 ms
rds-9.6.5 root@db1=> create index hour_idx
[more] - > on performance
[more] - > using btree
[more] - > (hour desc);
CREATE INDEX
Time: 69.220 ms
rds-9.6.5 root@db1=> explain select hour from performance order by hour desc limit 10;
QUERY PLAN
-----------------------------------------------------------------------------------------------
Limit (cost=0.42..0.68 rows=10 width=8)
-> Index Only Scan using hour_idx on performance (cost=0.42..4334.37 rows=166530 width=8)
(2 rows)
Time: 0.725 ms
Run Code Online (Sandbox Code Playgroud)
请注意,您也可以DESC从索引中删除;PostgreSQL 可以向前和向后扫描索引,并且在单列索引上通常不需要反转它们。您只需要注意顺序和空值的正确组合首先/最后。
rds-9.6.5 root@db1=> drop index hour_idx;
DROP INDEX
Time: 3.837 ms
rds-9.6.5 root@db1=> create index hour_idx
[more] - > on performance
[more] - > using btree
[more] - > (hour);
CREATE INDEX
Time: 94.815 ms
rds-9.6.5 root@db1=> explain select hour from performance order by hour desc limit 10;
QUERY PLAN
--------------------------------------------------------------------------------------------------------
Limit (cost=0.42..0.68 rows=10 width=8)
-> Index Only Scan Backward using hour_idx on performance (cost=0.42..4334.37 rows=166530 width=8)
(2 rows)
Time: 0.740 ms
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
817 次 |
| 最近记录: |