为什么 COALESCE 会阻止在 varchar 上使用索引，但不会阻止在文本列上使用索引？

Question

为什么 COALESCE 会阻止在 varchar 上使用索引，但不会阻止在文本列上使用索引？

Lon*_*Rob 5 postgresql varchar execution-plan index-tuning postgresql-performance

想象一个视图，其中包含来自两个不同表的COALESCE两varchar列。

底层证券varchars在两个表中都有索引。

在 Postgres 11.6 中，根据结果过滤此视图COALESCE不使用索引，而是进行表扫描。

但是，如果我将列更改为text，在同一列上过滤完全相同的视图，则索引将按您的预期使用。

例子

假设我有一个一些标识符随时间变化的测量值表。还有一个几乎相同的表，其中包含估计值：

CREATE TABLE measured (
  id int,
  ts timestamp,
  identifier character varying,
  measured_value int
);
CREATE INDEX ON measured(identifier);

CREATE TABLE estimated (
  id int,
  ts timestamp,
  identifier character varying,
  estimated_value int
);
CREATE INDEX ON estimated(identifier);

Run Code Online (Sandbox Code Playgroud)

每个表有 100 万行数据：

INSERT INTO measured
SELECT
    generate_series(1, 1000000),
    to_timestamp((random() * 100000)::int),
    left(md5(random()::text), 2),
    random() * 10;

INSERT INTO estimated
SELECT
    generate_series(1, 1000000),
    to_timestamp((random() * 100000)::int),
    left(md5(random()::text), 2),
    random() * 10;

Run Code Online (Sandbox Code Playgroud)

正如我们所期望的那样，单独对任一表中的列进行过滤都identifier使用索引。

我们有一个视图，它返回所有测量数据和所有估计数据，组合成一行，其中标识符和时间戳相同：

CREATE VIEW combined AS
SELECT
    COALESCE(measured.ts, estimated.ts) AS ts,
    COALESCE(measured.identifier, estimated.identifier) AS identifier,
    measured_value,
    estimated_value
FROM measured
FULL OUTER JOIN estimated ON measured.identifier = estimated.identifier
                         AND measured.ts = estimated.ts;

Run Code Online (Sandbox Code Playgroud)

此组合视图在对列进行筛选时执行表扫描identifier：

EXPLAIN ANALYZE
SELECT * FROM combined
WHERE identifier = 'ab';

Run Code Online (Sandbox Code Playgroud)

但是，如果我们将的数据类型更改identifier为text，则过滤视图将按预期使用索引。

这是一个错误吗？

为了方便起见，我将整个脚本粘贴在这里

Answer 1

Erw*_*ter 3

我可以确认问题所在。即使在 Postgres 13 中。即使在之后：

SET enable_seqscan = off;

Run Code Online (Sandbox Code Playgroud)

...如果可能的话，这会强制索引。准确地说，使用索引，但作为完整索引扫描 - 避免顺序扫描的最后手段。指示索引是否有意义地使用的是索引条件，而不是EXPLAIN输出中的过滤器。喜欢：

SET enable_seqscan = off;

Run Code Online (Sandbox Code Playgroud)

代替：

Index Cond: (identifier = 'ab'::text)

Run Code Online (Sandbox Code Playgroud)

单独的索引扫描，没有索引条件，“使用”索引，但没有以有用的方式：

Filter: ((COALESCE(measured.identifier, estimated.identifier))::text = 'ab'::text)

Run Code Online (Sandbox Code Playgroud)

查询更简单

首先，让我们使用USING连接条件中的子句来简化基本查询：

CREATE VIEW combined AS
SELECT ts
     , identifier
     , m.measured_value
     , e.estimated_value
FROM   measured m
FULL   JOIN estimated e USING (identifier, ts);

Run Code Online (Sandbox Code Playgroud)

等效，只是连接列identifier和ts只添加到输出列列表一次，这只会是有益的，特别是因为SELECT *现在提供唯一的列名称。

但它对于索引问题没有任何作用。

研究

text这当然与字符串类型中“首选”的事实有关。varchar与二进制兼容text，基本上只是一个别名，或者更确切地说，varchar类似于具有基本类型的域text。列上的索引在内部varchar使用该类型text。

这使得 Postgres 在使用索引时添加强制转换text（实际上是无操作）。EXPLAIN即使对于最简单的查询，输出也很明显：

EXPLAIN SELECT * FROM combined WHERE identifier = 'ab';

...
Index Cond: ((identifier)::text = 'ab'::text)
...

Run Code Online (Sandbox Code Playgroud)

这种强制转换成为text沿途的障碍。不确定查询计划者在哪里对索引的适用性失去了信心。text从一开始就使用显然可以解决这个问题。

视图是通过查询重写规则来实现的。我认为我们可以排除规则系统的任何参与。我可以单独使用查询重现问题，无需VIEW（在 Postgres 13 中）：

db<>在这里摆弄

同样，索引与数据类型一起使用text：

db<>在这里摆弄

但仅限于COALESCE和FULL OUTER JOIN。

为了确保这一点，我用一个integer专栏测试了相同的场景。结果与以下相同text：

db<>在这里摆弄

结论

COALESCE结合起来FULL OUTER JOIN似乎得到了查询规划器的特殊对待，因此索引是适用的。它在表达式中用于计算输出列identifier，在原始查询中显式使用或在我的简化查询中隐式使用。这似乎与 a 的连接条件FULL OUTER JOIN在内部实现的方式相匹配。在使用时，由于某种原因，这会表现不佳varchar- 最有可能是因为添加了（逻辑上不相关的）转换为::text.

归档时间：	4 年，3 月前
查看次数：	1951 次
最近记录：	4 年，3 月前