为什么 Postgres 扫描一个巨大的表而不是使用我的索引？

Question

为什么 Postgres 扫描一个巨大的表而不是使用我的索引？

ama*_*loy 5 sql postgresql indexing performance sql-execution-plan

我注意到我的一个 SQL 查询比我预期的要慢得多，结果查询计划程序提出了一个对我来说似乎很糟糕的计划。我的查询如下所示：

select A.style, count(B.x is null) as missing, count(*) as total
  from A left join B using (id, type)
  where A.country_code in ('US', 'DE', 'ES')
  group by A.country_code, A.style
  order by A.country_code, total

Run Code Online (Sandbox Code Playgroud)

B 有一个 (type, id) 索引，A 有一个 (country_code, style) 索引。A 比 B 小得多：A 中有 250K 行，B 中有 100M。

所以，我希望查询计划看起来像：

使用 A 上的索引只选择那些具有适当 country_code
与 B 左连接，根据其(type, id)索引查找匹配行（如果有）
根据country_code和分组事物style
将计数相加

但是查询规划器决定执行此操作的最佳方法是对 B 进行顺序扫描，然后对 A 进行右连接。我无法理解为什么会这样；有没有人有想法？这是它生成的实际查询计划：

 Sort  (cost=14283513.27..14283513.70 rows=171 width=595)
   Sort Key: a.country_code, (count(*))
   ->  HashAggregate  (cost=14283505.22..14283506.93 rows=171 width=595)
         ->  Hash Right Join  (cost=8973.71..14282810.03 rows=55615 width=595)
               Hash Cond: ((b.type = a.type) AND (b.id = a.id))
               ->  Seq Scan on b (cost=0.00..9076222.44 rows=129937844 width=579)
               ->  Hash  (cost=8139.49..8139.49 rows=55615 width=28)
                     ->  Bitmap Heap Scan on a  (cost=1798.67..8139.49 rows=55615 width=28)
                           Recheck Cond: ((country_code = ANY ('{US,DE,ES}'::bpchar[])))
                           ->  Bitmap Index Scan on a_country_code_type_idx  (cost=0.00..1784.76 rows=55615 width=0)
                                 Index Cond: ((country_code = ANY ('{US,DE,ES}'::bpchar[])))

Run Code Online (Sandbox Code Playgroud)

编辑：根据另一个问题的评论中的线索，我尝试使用SET ENABLE_SEQSCAN TO OFF;，并且查询运行速度提高了十倍。显然我不想永久禁用顺序扫描，但这有助于确认我在其他方面毫无根据的猜测，即顺序扫描不是可用的最佳计划。

Answer 1

Erw*_*ter 6

如果添加的测试证明使用索引扫描查询实际上更快，那么它通常是以下之一或两者：

您的统计数据不准确或不够精确，无法涵盖不规则的数据分布。
您的成本设置已关闭，Postgres 使用它来估算成本。

这个密切相关的答案中两者的详细信息：

防止 PostgreSQL 有时选择错误的查询计划

归档时间：	11 年，6 月前
查看次数：	1872 次
最近记录：	8 年，1 月前