为什么在使用覆盖索引时 Postgres 仍然执行位图堆扫描？

Question

为什么在使用覆盖索引时 Postgres 仍然执行位图堆扫描？

该表看起来像这样：

CREATE TABLE "audit_log" (
  "id" int4 NOT NULL DEFAULT nextval('audit_log_id_seq'::regclass),
  "entity" varchar(50) COLLATE "public"."ci",
  "updated" timestamp(6) NOT NULL,
  "transaction_id" uuid,
  CONSTRAINT "PK_audit_log" PRIMARY KEY ("id")
);

Run Code Online (Sandbox Code Playgroud)

它包含数百万行。

我尝试在一列上添加索引，如下所示：

CREATE INDEX "testing" ON "audit_log" USING btree (
  "entity" COLLATE "public"."ci" "pg_catalog"."text_ops" ASC NULLS LAST
);

Run Code Online (Sandbox Code Playgroud)

然后对索引列和主键运行以下查询：

EXPLAIN ANALYZE SELECT entity, id FROM audit_log WHERE entity = 'abcd'

Run Code Online (Sandbox Code Playgroud)

正如我所料，查询计划同时使用位图索引扫描（大概是为了查找“实体”列）和位图堆扫描（我假设是为了检索“id”列）：

Gather  (cost=2640.10..260915.23 rows=87166 width=122) (actual time=2.828..3.764 rows=0 loops=1)
  Workers Planned: 2
  Workers Launched: 2
  ->  Parallel Bitmap Heap Scan on audit_log  (cost=1640.10..251198.63 rows=36319 width=122) (actual time=0.061..0.062 rows=0 loops=3)
        Recheck Cond: ((entity)::text = '1234'::text)
        ->  Bitmap Index Scan on testing  (cost=0.00..1618.31 rows=87166 width=0) (actual time=0.036..0.036 rows=0 loops=1)
              Index Cond: ((entity)::text = '1234'::text)

Run Code Online (Sandbox Code Playgroud)

接下来，我向索引添加了一个 INCLUDE 列，以使其覆盖上述查询：

DROP INDEX testing

CREATE INDEX testing ON audit_log USING btree (
    "entity" COLLATE "public"."ci" "pg_catalog"."text_ops" ASC NULLS LAST
)
INCLUDE
(
  "id"
)

Run Code Online (Sandbox Code Playgroud)

然后我重新运行查询，但它仍然执行位图堆扫描：

Gather  (cost=2964.10..261239.23 rows=87166 width=122) (actual time=2.711..3.570 rows=0 loops=1)
  Workers Planned: 2
  Workers Launched: 2
  ->  Parallel Bitmap Heap Scan on audit_log  (cost=1964.10..251522.63 rows=36319 width=122) (actual time=0.062..0.062 rows=0 loops=3)
        Recheck Cond: ((entity)::text = '1234'::text)
        ->  Bitmap Index Scan on testing  (cost=0.00..1942.31 rows=87166 width=0) (actual time=0.029..0.029 rows=0 loops=1)
              Index Cond: ((entity)::text = '1234'::text)

Run Code Online (Sandbox Code Playgroud)

这是为什么？

Answer 1

Lau*_*lbe 14

PostgreSQL 使用称为可见性的概念来实现行版本控制。每个查询都知道它可以看到行的哪个版本。

现在，可见性信息存储在表行中，而不是存储在索引条目中，因此必须访问该表才能测试该行是否可见。

因此，每个位图索引扫描都需要位图堆扫描。

为了克服这个不幸的特性，PostgreSQL 引入了可见性映射，这是一种数据结构，用于存储表的每个 8kB 块是否对每个人都可见该块中的所有行。如果是这种情况，则可以跳过查找表行。这仅适用于常规索引扫描，不适用于位图索引扫描。

该可见性地图由维护VACUUM。VACUUM因此，在表上运行，然后您可能会得到表上的仅索引扫描。

如果仅此还不够，您可以尝试CLUSTER按索引顺序重写表。

有关 PostgreSQL 如何估计索引扫描成本的一些详细信息。以下代码cost_index来自src/backend/optimizer/path/costsize.c：

    /*----------
[...]
     * If it's an index-only scan, then we will not need to fetch any heap
     * pages for which the visibility map shows all tuples are visible.
     * Hence, reduce the estimated number of heap fetches accordingly.
     * We use the measured fraction of the entire heap that is all-visible,
     * which might not be particularly relevant to the subset of the heap
     * that this query will fetch; but it's not clear how to do better.
     *----------
     */

[...]

        if (indexonly)
            pages_fetched = ceil(pages_fetched * (1.0 - baserel->allvisfrac));

Run Code Online (Sandbox Code Playgroud)

allvisfrac使用进行计算pg_class.relallvisible，其中保存表中所有可见页面数量的估计值，以及pg_class.relpages。

归档时间：	5 年，5 月前
查看次数：	4359 次
最近记录：	3 年，7 月前