从大表的索引字段中选择前 10 个需要太长时间

Dej*_*ell 8 postgresql performance index explain

我有一个包含 165M 记录的表,如下所示:

Performance
   id        integer
   installs  integer
   hour      timestamp without time zone
Run Code Online (Sandbox Code Playgroud)

我也有一个小时索引:

CREATE INDEX hour_idx
  ON performance
  USING btree
  (hour DESC NULLS LAST);
Run Code Online (Sandbox Code Playgroud)

但是,选择按小时排序的前 10 条记录需要 6 分钟!

EXPLAIN ANALYZE  select hour from performance order by hour desc limit 10
Run Code Online (Sandbox Code Playgroud)

退货

Limit  (cost=7952135.23..7952135.25 rows=10 width=8) (actual time=376313.958..376313.964 rows=10 loops=1)
  ->  Sort  (cost=7952135.23..8368461.00 rows=166530310 width=8) (actual time=376313.957..376313.960 rows=10 loops=1)
        Sort Key: hour
        Sort Method: top-N heapsort  Memory: 25kB
        ->  Seq Scan on performance  (cost=0.00..4353475.10 rows=166530310 width=8) (actual time=0.006..327149.828 rows=192330557 loops=1)
Planning time: 0.070 ms
Execution time: 376330.573 ms
Run Code Online (Sandbox Code Playgroud)

为什么需要这么长时间?如果日期字段 desc 上有索引 - 检索数据不是应该非常快吗?

小智 18

在上面的示例代码中,索引是显式创建的,NULLS LAST并且查询是隐式运行的NULLS FIRST(这是 的默认值ORDER BY .. DESC),因此如果 PostgreSQL 使用了索引,则需要对数据重新排序。因此,索引实际上会使查询比(已经很慢的)表扫描慢很多倍。

rds-9.6.5 root@db1=> create table performance (id integer, installs integer, hour timestamp without time zone);
CREATE TABLE
Time: 28.100 ms

rds-9.6.5 root@db1=> with generator as (select generate_series(1,166530) i)
[more] - > insert into performance (
[more] ( >   select
[more] ( >     i id,
[more] ( >     (random()*1000)::integer installs,
[more] ( >     (now() - make_interval(secs => i))::timestamp installs
[more] ( >   from generator
[more] ( > );
INSERT 0 166530
Time: 244.872 ms

rds-9.6.5 root@db1=> create index hour_idx
[more] - > on performance
[more] - > using btree
[more] - > (hour desc nulls last);
CREATE INDEX
Time: 67.089 ms

rds-9.6.5 root@db1=> vacuum analyze performance;
VACUUM
Time: 43.552 ms
Run Code Online (Sandbox Code Playgroud)

我们可以WHERE在小时列上添加一个子句,以便使用索引成为一个好主意 - 但请注意我们仍然需要从索引中重新排序数据。

rds-9.6.5 root@db1=> explain select hour from performance where hour>now() order by hour desc limit 10;
                                         QUERY PLAN
---------------------------------------------------------------------------------------------
 Limit  (cost=4.45..4.46 rows=1 width=8)
   ->  Sort  (cost=4.45..4.46 rows=1 width=8)
         Sort Key: hour DESC
         ->  Index Only Scan using hour_idx on performance  (cost=0.42..4.44 rows=1 width=8)
               Index Cond: (hour > now())
(5 rows)

Time: 0.789 ms
Run Code Online (Sandbox Code Playgroud)

如果我们NULLS LAST向您的查询添加显式,那么它将按预期使用索引。

rds-9.6.5 root@db1=> explain select hour from performance order by hour desc NULLS LAST limit 10;
                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
 Limit  (cost=0.42..0.68 rows=10 width=8)
   ->  Index Only Scan using hour_idx on performance  (cost=0.42..4334.37 rows=166530 width=8)
(2 rows)

Time: 0.526 ms
Run Code Online (Sandbox Code Playgroud)

或者,如果我们NULLS LAST从您的索引中删除(非默认),那么查询将按预期使用它而无需修改。

rds-9.6.5 root@db1=> drop index hour_idx;
DROP INDEX
Time: 4.124 ms

rds-9.6.5 root@db1=> create index hour_idx
[more] - > on performance
[more] - > using btree
[more] - > (hour desc);
CREATE INDEX
Time: 69.220 ms

rds-9.6.5 root@db1=> explain select hour from performance order by hour desc limit 10;
                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
 Limit  (cost=0.42..0.68 rows=10 width=8)
   ->  Index Only Scan using hour_idx on performance  (cost=0.42..4334.37 rows=166530 width=8)
(2 rows)

Time: 0.725 ms 
Run Code Online (Sandbox Code Playgroud)

请注意,您也可以DESC从索引中删除;PostgreSQL 可以向前和向后扫描索引,并且在单列索引上通常不需要反转它们。您只需要注意顺序和空值的正确组合首先/最后。

rds-9.6.5 root@db1=> drop index hour_idx;
DROP INDEX
Time: 3.837 ms

rds-9.6.5 root@db1=> create index hour_idx
[more] - > on performance
[more] - > using btree
[more] - > (hour);
CREATE INDEX
Time: 94.815 ms

rds-9.6.5 root@db1=> explain select hour from performance order by hour desc limit 10;
                                               QUERY PLAN
--------------------------------------------------------------------------------------------------------
 Limit  (cost=0.42..0.68 rows=10 width=8)
   ->  Index Only Scan Backward using hour_idx on performance  (cost=0.42..4334.37 rows=166530 width=8)
(2 rows)

Time: 0.740 ms
Run Code Online (Sandbox Code Playgroud)