IS NULL 上的 Postgres 部分索引不起作用

Question

IS NULL 上的 Postgres 部分索引不起作用

Gee*_*ock 3 postgresql performance index postgresql-10 postgresql-performance

Postgres 版本

使用 PostgreSQL 10.3。

表定义

CREATE TABLE tickets (
  id bigserial primary key,
  state character varying,
  closed timestamp
);

CREATE INDEX  "state_index" ON "tickets" ("state")
  WHERE ((state)::text = 'open'::text));

Run Code Online (Sandbox Code Playgroud)

基数

该表包含 1027616 行，其中 51533 行具有state = 'open'和closed IS NULL或 5%。

条件为 on 的查询state按预期使用索引扫描执行良好：

explain analyze select * from tickets where state = 'open';

Index Scan using state_index on tickets  (cost=0.29..16093.57 rows=36599 width=212) (actual time=0.025..22.875 rows=37346 loops=1)
Planning time: 0.212 ms
Execution time: 25.697 ms

Run Code Online (Sandbox Code Playgroud)

我正在尝试使用条件为查询实现相同或更好的性能，closed IS NULL以便我可以删除该state列并依靠该closed列来获取相同的行。closed是null对于其中相同的行state = 'open'，因此state列是多余的。

select * from tickets where closed IS NULL;

Run Code Online (Sandbox Code Playgroud)

但是，我尝试过的索引都没有像第一个查询那样导致单个索引扫描。以下是我尝试过的索引以及EXPLAIN ANALYZE结果。

部分索引：

CREATE INDEX  "closed_index" ON "tickets"  ("closed") WHERE (closed IS NULL)

explain analyze select * from tickets where closed IS NULL;

Bitmap Heap Scan on tickets  (cost=604.22..38697.91 rows=36559 width=212) (actual time=12.879..48.780 rows=37348 loops=1)
  Recheck Cond: (closed IS NULL)
  Heap Blocks: exact=14757
  ->  Bitmap Index Scan on closed_index  (cost=0.00..595.09 rows=36559 width=0) (actual time=7.585..7.585 rows=37348 loops=1)
Planning time: 4.831 ms
Execution time: 52.068 ms

Run Code Online (Sandbox Code Playgroud)

表达式索引：

CREATE INDEX  "closed_index" ON "tickets" ((closed IS NULL))

explain analyze select * from tickets where closed IS NULL;

Seq Scan on tickets  (cost=0.00..45228.26 rows=36559 width=212) (actual time=0.025..271.418 rows=37348 loops=1)
  Filter: (closed IS NULL)
  Rows Removed by Filter: 836578
Planning time: 7.992 ms
Execution time: 274.504 ms

Run Code Online (Sandbox Code Playgroud)

部分表达式索引：

CREATE INDEX  "closed_index" ON "tickets" ((closed IS NULL))
  WHERE (closed IS NULL);

explain analyze select * from tickets where closed IS NULL;

Bitmap Heap Scan on tickets  (cost=604.22..38697.91 rows=36559 width=212) (actual time=177.109..238.008 rows=37348 loops=1)
  Recheck Cond: (closed IS NULL)
  Heap Blocks: exact=14757
  ->  Bitmap Index Scan on "closed_index"  (cost=0.00..595.09 rows=36559 width=0) (actual time=174.598..174.598 rows=37348 loops=1)
Planning time: 23.063 ms
Execution time: 241.292 ms

Run Code Online (Sandbox Code Playgroud)

更新

扩展表定义：

CREATE TABLE tickets (
  id bigserial primary key,
  state character varying,
  closed timestamp,
  created timestamp,
  updated timestamp,
  title character varying,
  size integer NOT NULL,
  comment_count integer NOT NULL
);

CREATE INDEX  "state_index" ON "tickets" ("state")
  WHERE ((state)::text = 'open'::text));

Run Code Online (Sandbox Code Playgroud)

基数：

该表包含 1027616 行，其中 51533 行的 state = 'open' 和 closed IS NULL，即 5%。如上所述，我试图删除state列，因此我希望能够使用closed列上的条件来获取相同的行。

以列为条件的查询state使用索引扫描。

explain analyze select id, title, created, closed, updated from tickets where state = 'open';

Index Scan using state_index on tickets  (cost=0.29..22901.58 rows=49356 width=72) (actual time=0.107..49.599 rows=51533 loops=1)
Planning time: 0.511 ms
Execution time: 54.366 ms

Run Code Online (Sandbox Code Playgroud)

在切换到对closed列进行查询时，我想要相同的性能（理想情况下是索引扫描）。我在id和上尝试了部分索引closed IS NULL：

CREATE INDEX closed_index ON tickets (id) WHERE closed IS NULL;

VACUUM ANALYZE tickets;

explain analyze select id, title, created, closed, updated from tickets where closed IS NULL;

Bitmap Heap Scan on tickets  (cost=811.96..33999.42 rows=49461 width=72) (actual time=7.868..47.080 rows=51537 loops=1)
  Recheck Cond: (closed IS NULL)
  Heap Blocks: exact=17479
  ->  Bitmap Index Scan on closed_index  (cost=0.00..799.60 rows=49461 width=0) (actual time=4.868..4.868 rows=51537 loops=1)
Planning time: 0.222 ms
Execution time: 51.028 ms

Run Code Online (Sandbox Code Playgroud)

Answer 1

Erw*_*ter 8

假设核心信息：

约 15% 的行具有state = 'open'和closed IS NULL

应该意味着所有 1031584 行中有 15% 满足这两个条件_{（所有细节都很重要！）}。两个条件应该执行相同的 - 返回大约155k行（！）

您的查询计划显示37346行符合条件的行，约 3.6% 而不是 15%。你的问题还是有问题。

有了 3.6%，索引才开始有意义。您的小行大小有效地占用了每行 ~ 52 个字节，每页大约 155 行。对于完全随机的分布，这将是每页 5-6 次点击。Postgres无论如何都必须读取所有页面，顺序扫描应该是最快的计划。过滤未命中应该比以任何方式涉及索引更快。

通常，符合条件的行或多或少地聚集在一起，并且涉及的数据页越少，涉及索引的意义就越大。但是，所有位图索引扫描，我几乎看不到任何索引扫描的情况。对于您声称的 15%，甚至要少得多（就“少”而言，“几乎没有”）。

对于您更新的数字（所有行的约 5% 匹配），我仍然希望位图索引扫描而不是索引扫描。一个可能的解释：表格膨胀，有很多死元组。您提到了高写入负载。这将导致每个数据页的活动元组越来越少，并有利于索引扫描（与位图索引扫描相比）。您可以在 a 之后重新测试您的初始查询VACUUM FULL ANALYZE（如果您能负担得起对表的排他锁！）。如果我的假设成立，那么物理表的大小会大幅缩小，然后您会看到位图索引扫描而不是索引扫描（而且速度也会更快）。

您可能需要更激进的autovacuum设置。看：

PostgreSQL 上的主动式自动清理

部分索引

您的“表达索引”和“部分表达索引”没有用。我们不需要closed IS NULL作为实际的索引表达式（它总是true在这里）。该表达式仅增加成本而没有收益。

第一个简单的部分索引是更有用的变体。但不要closed用作索引表达式（同样，始终NULL在这里）。相反，使用任何可能对其他查询有用的列，最好永远不要更新以避免额外的成本和索引膨胀。id在没有其他有用的应用程序的情况下，主键列是自然的候选者：

CREATE INDEX closed_index ON tickets (id) WHERE closed IS NULL;

Run Code Online (Sandbox Code Playgroud)

或者，如果id没有用，请考虑使用常量：

CREATE INDEX closed_index ON tickets ((1)) WHERE closed IS NULL;

Run Code Online (Sandbox Code Playgroud)

这使得实际的索引列像其他被忽略的变体一样无用 - 但它避免了所有额外的成本和依赖性。有关的：

从 pg_class.reltuples 获取给定条件下的计数估计值

我可能会尝试：

更新你的更新问题-才有意义，如果你没有在问题上的行许多其他写入（添加的栏目updated，并comments_count让我怀疑。）

*使用id其他相关列（很少和小）创建部分索引作为索引表达式，并使用合适的查询利用它来获取仅索引扫描：*

CREATE INDEX closed_index ON tickets (id, title, created, updated)
WHERE closed IS NULL;

VACUUM ANALYZE tickets;   -- just to prove idx-only is possible

SELECT id, title, created, updated
     , NULL::timestamp AS closed  -- redundant, rather drop it
FROM   tickets
WHERE  closed IS NULL;

Run Code Online (Sandbox Code Playgroud)

我们不需要SELECT *，closed IS NULL由WHERE子句给出。因此，我们可以在仅索引的快速扫描中使用微小的部分索引- 假设您满足先决条件（这就是我在VACUUM那里更新可见性地图的原因）。这是一种罕见的情况，其中读取超过所有行的约 5% 的查询仍然很乐意使用索引（甚至包括整个表）。

您的设计中似乎存在冗余，应该可以进行简化。

这从 Postgres 9.6 开始工作，引用发行说明：

当索引的子句引用未编入索引的列时，允许对部分索引使用仅索引扫描WHERE（Tomas Vondra、Kyotaro Horiguchi）

例如，由定义的索引CREATE INDEX tidx_partial ON t(b) WHERE a > 0现在可用于指定WHERE a > 0但不使用的查询的仅索引扫描a。以前这是不允许的，因为 a 未列为索引列。

或者您问题中的信息具有误导性。

有关的：

如果您没有看到仅索引扫描，即使在运行之后也没有看到VACUUM，那么高写入负载可能会妨碍并且可见性映射永远不会达到允许仅索引扫描的状态。手册。或者您的数据库中存在另一个问题，VACUUM无法完成其工作。有关的：

Postgres：“vacuum”命令不会清理死元组

归档时间：	6 年，7 月前
查看次数：	3231 次
最近记录：	6 年，7 月前