PostgreSQL：查询中的限制使其不使用索引

Question

PostgreSQL：查询中的限制使其不使用索引

我有一个带有 BRIN 索引的大表，如果我使用 limit 进行查询，它会忽略索引并进行序列扫描，如果没有它，它会使用索引（我尝试了几次，结果相同）

explain (analyze,verbose,buffers,timing,costs) 
select *
from testj.cdc_s5_gpps_ind
where id_transformace = 1293
limit 100

Limit  (cost=0.00..349.26 rows=100 width=207) (actual time=28927.179..28927.214 rows=100 loops=1)
  Output: id, date_key_trainjr...
  Buffers: shared hit=225 read=1680241
  ->  Seq Scan on testj.cdc_s5_gpps_ind  (cost=0.00..3894204.10 rows=1114998 width=207) (actual time=28927.175..28927.202 rows=100 loops=1)
        Output: id, date_key_trainjr...
        Filter: (cdc_s5_gpps_ind.id_transformace = 1293)
        Rows Removed by Filter: 59204140
        Buffers: shared hit=225 read=1680241
Planning Time: 0.149 ms
Execution Time: 28927.255 ms

explain (analyze,verbose,buffers,timing,costs) 
select *
from testj.cdc_s5_gpps_ind
where id_transformace = 1293

Bitmap Heap Scan on testj.cdc_s5_gpps_ind  (cost=324.36..979783.34 rows=1114998 width=207) (actual time=110.103..467.008 rows=1073725 loops=1)
  Output: id, date_key_trainjr...
  Recheck Cond: (cdc_s5_gpps_ind.id_transformace = 1293)
  Rows Removed by Index Recheck: 11663
  Heap Blocks: lossy=32000
  Buffers: shared hit=32056
  ->  Bitmap Index Scan on gpps_brin_index  (cost=0.00..45.61 rows=1120373 width=0) (actual time=2.326..2.326 rows=320000 loops=1)
        Index Cond: (cdc_s5_gpps_ind.id_transformace = 1293)
        Buffers: shared hit=56
Planning Time: 1.343 ms
JIT:
  Functions: 2
  Options: Inlining true, Optimization true, Expressions true, Deforming true
  Timing: Generation 0.540 ms, Inlining 32.246 ms, Optimization 44.423 ms, Emission 22.524 ms, Total 99.732 ms
Execution Time: 537.627 ms

Run Code Online (Sandbox Code Playgroud)

这种行为有原因吗？

x86_64-pc-linux-gnu 上的 PostgreSQL 12.3，由 gcc (GCC) 8.3.1 20191121 (Red Hat 8.3.1-5) 编译，64 位

Answer 1

jja*_*nes 6

这样做有一个非常简单（并不是说很好）的原因。规划器假设 id_transformace = 1293 的行均匀分布在整个表中，因此它将能够通过 seq 扫描非常快速地收集其中的 100 个行，然后提前停止。但这个假设是非常错误的，需要遍历表的一大块才能找到 100 个符合条件的行。

此假设不基于表上收集的任何统计信息，因此增加统计目标不会有帮助。扩展统计也无济于事，因为它只提供列之间的统计，而不提供列和物理排序之间的统计。

纯粹在库存服务器端没有好的干净方法来解决这个问题。一种解决方法是set enable_seqscan=off在运行查询之前，然后重置后缀。另一种方法是添加ORDER BY random()到您的查询中，这样规划者就知道它不能提前停止。或者也许扩展pg_hint_plan可以提供帮助，我从未使用过它。

您可能会通过调整一些 *_cost 参数来改变计划，但这可能会让其他事情变得更糟。查看使用enable_seqscan=off 运行的LIMITed 查询的EXPLAIN (ANALYZE, BUFFERS) 的输出可以告知该决定。

归档时间：	5 年，7 月前
查看次数：	2711 次
最近记录：	2 年，9 月前