Postgres 中具有部分索引和大 In-clause 时的错误执行计划

Ale*_*øld 5 postgresql index execution-plan postgresql-11

Postgres 似乎总是使用顺序扫描,它可以使用部分索引来仅获取索引扫描。它仅在一个从句超过 100 个元素时发生。

鉴于下表:

create table foo(id bigint primary key, bar bigint); 

insert into foo (id, bar) 
select g.id, case when id % 1000 = 0 then id else null end
from generate_series(1, 10000000) AS g (id) ;

--Create partial index
create unique index ix_foo_bar on foo(bar) where bar is not null;

analyze foo;
Run Code Online (Sandbox Code Playgroud)

并给出以下带有大语句的查询:

explain analyze select count(*) from foo where bar in (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101);
Run Code Online (Sandbox Code Playgroud)

查询计划显示顺序扫描。它很慢,而且成本很高:

QUERY PLAN                                                                             
------------------------------------
 Finalize Aggregate  (cost=612955.35..612955.36 rows=1 width=8) (actual time=254.605..254.605 rows=1 loops=1)
   ->  Gather  (cost=612955.13..612955.34 rows=2 width=8) (actual time=254.474..258.242 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate  (cost=611955.13..611955.14 rows=1 width=8) (actual time=247.743..247.744 rows=1 loops=3)
               ->  Parallel Seq Scan on foo  (cost=0.00..611955.03 rows=42 width=0) (actual time=247.740..247.740 rows=0 loops=3)
                     Filter: (bar = ANY ('{1,2,3,4,5(...)
                     Rows Removed by Filter: 3333333
 Planning Time: 0.867 ms
 Execution Time: 258.323 ms
Run Code Online (Sandbox Code Playgroud)

set enable_seqscan 没有效果 - 它仍然执行顺序扫描。

如果我向查询添加“非空”,它会使用索引:

explain analyze select count(*) from foo where bar is not null and bar in (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101);
Run Code Online (Sandbox Code Playgroud)

仅索引扫描:

    QUERY PLAN                                                                                                                                                                        
----------------------------------------------------------------------
 Aggregate  (cost=153.55..153.56 rows=1 width=8) (actual time=0.267..0.267 rows=1 loops=1)
   ->  Index Only Scan using ix_foo_bar on foo  (cost=0.29..153.55 rows=1 width=0) (actual time=0.262..0.262 rows=0 loops=1)
         Index Cond: (bar = ANY ('{1,2,3,4,5,6 (...), 101}'::bigint[]))
         Heap Fetches: 0
 Planning Time: 0.531 ms
 Execution Time: 0.319 ms
(6 rows)
Run Code Online (Sandbox Code Playgroud)

如果我在子句中只有较少的元素(截止值为 100 对 101),或者如果我有完整索引而不是部分索引,它也会使用索引。

当我有一个包含超过 100 个元素的子句时,为什么 Postgres 不使用部分索引?这是查询规划器的已知限制,还是错误?

jja*_*nes 3

该问题将在即将发布的版本 12 中修复。

我认为这里的总结是,我们只愿意做这么多工作来尝试证明可以使用部分索引,因为所有查询都必须完成这项工作,即使它们最终没有使用部分索引。在此更改中,他们只是找到了一种更有效的方法来在这种特定情况下完成该工作,因此不再对其施加 100 个元素的限制。