Postgres 中的 EXISTS() 与 EXISTS() = TRUE

sco*_*rpp 5 postgresql exists postgresql-performance

面临奇怪的行为EXISTS(也适用于NOT EXISTS)生成不同的执行计划

WHERE EXISTS(...)

EXPLAIN ANALYZE
SELECT * FROM books
WHERE EXISTS (SELECT 1 FROM authors WHERE id = books.author_id AND name LIKE 'asd%');

| QUERY PLAN                                                                                                     |
| -------------------------------------------------------------------------------------------------------------- |
| Hash Join  (cost=218.01..454.43 rows=56 width=40) (actual time=0.975..0.975 rows=0 loops=1)                    |
|   Hash Cond: (books.author_id = authors.id)                                                                    |
|   ->  Seq Scan on books  (cost=0.00..206.80 rows=11280 width=40) (actual time=0.010..0.010 rows=1 loops=1)     |
|   ->  Hash  (cost=217.35..217.35 rows=53 width=4) (actual time=0.943..0.943 rows=0 loops=1)                    |
|         Buckets: 1024  Batches: 1  Memory Usage: 8kB                                                           |
|         ->  Seq Scan on authors  (cost=0.00..217.35 rows=53 width=4) (actual time=0.942..0.943 rows=0 loops=1) |
|               Filter: ((name)::text ~~ 'asd%'::text)                                                           |
|               Rows Removed by Filter: 10000                                                                    |
| Planning Time: 0.361 ms                                                                                        |
| Execution Time: 1.022 ms                                                                                       |
Run Code Online (Sandbox Code Playgroud)

WHERE EXISTS(...) = TRUE

EXPLAIN ANALYZE
SELECT * FROM books
WHERE EXISTS (SELECT id FROM authors WHERE id = books.author_id AND name LIKE 'asd%') = True;

| QUERY PLAN                                                                                                           |
| -------------------------------------------------------------------------------------------------------------------- |
| Seq Scan on books  (cost=0.00..93887.20 rows=5640 width=40) (actual time=2.054..2.054 rows=0 loops=1)                |
|   Filter: (alternatives: SubPlan 1 or hashed SubPlan 2)                                                              |
|   Rows Removed by Filter: 10000                                                                                      |
|   SubPlan 1                                                                                                          |
|     ->  Index Scan using authors_pkey on authors  (cost=0.29..8.30 rows=1 width=0) (never executed)                  |
|           Index Cond: (id = books.author_id)                                                                         |
|           Filter: ((name)::text ~~ 'asd%'::text)                                                                     |
|   SubPlan 2                                                                                                          |
|     ->  Seq Scan on authors authors_1  (cost=0.00..217.35 rows=53 width=4) (actual time=0.931..0.931 rows=0 loops=1) |
|           Filter: ((name)::text ~~ 'asd%'::text)                                                                     |
|           Rows Removed by Filter: 10000                                                                              |
| Planning Time: 0.298 ms                                                                                              |
| Execution Time: 2.129 ms                                                                                             |
Run Code Online (Sandbox Code Playgroud)

特别令人感兴趣的是哈希连接与简单的序列扫描以及 2 倍时间差异。

小提琴https://www.db-fiddle.com/f/a7kedUeFmCu4tWLFRoxMJA/2

Lau*_*lbe 9

PostgreSQL 能够优化WHERE EXISTS (/* correlated subquery */)为 join 或 semi-join,但它不够智能,无法检测到= TRUEinEXISTS (...) = TRUE可以被删除,因此它在这里不应用优化。

由于未使用优化,因此第二个计划较慢也就不足为奇了。不过,说实话,对于这样一个微小的查询,差异可能只是噪音。


第二个执行计划的一些背景:

第二个计划显示alternatives:您正在使用旧版本的 PostgreSQL,它仍然有AlternativeSubPlans. 其背后的想法是,如果行计数估计被证明是错误的,PostgreSQL 可能会决定在查询执行期间开始使用不同的子计划。此功能已在 v14 中通过提交41efb83408删除。您可能需要参考 Tom Lane 的提交消息以了解详细信息:

Move resolution of AlternativeSubPlan choices to the planner.

When commit bd3daddaf introduced AlternativeSubPlans, I had some
ambitions towards allowing the choice of subplan to change during
execution.  That has not happened, or even been thought about, in the
ensuing twelve years; so it seems like a failed experiment.  So let's
rip that out and resolve the choice of subplan at the end of planning
(in setrefs.c) rather than during executor startup.  This has a number
of positive benefits:

* Removal of a few hundred lines of executor code, since
AlternativeSubPlans need no longer be supported there.

* Removal of executor-startup overhead (particularly, initialization
of subplans that won't be used).

* Removal of incidental costs of having a larger plan tree, such as
tree-scanning and copying costs in the plancache; not to mention
setrefs.c's own costs of processing the discarded subplans.

* EXPLAIN no longer has to print a weird (and undocumented)
representation of an AlternativeSubPlan choice; it sees only the
subplan actually used.  This should mean less confusion for users.

* Since setrefs.c knows which subexpression of a plan node it's
working on at any instant, it's possible to adjust the estimated
number of executions of the subplan based on that.  For example,
we should usually estimate more executions of a qual expression
than a targetlist expression.  The implementation used here is
pretty simplistic, because we don't want to expend a lot of cycles
on the issue; but it's better than ignoring the point entirely,
as the executor had to.

That last point might possibly result in shifting the choice
between hashed and non-hashed EXISTS subplans in a few cases,
but in general this patch isn't meant to change planner choices.
Since we're doing the resolution so late, it's really impossible
to change any plan choices outside the AlternativeSubPlan itself.

Patch by me; thanks to David Rowley for review.

Discussion: https://postgr.es/m/1992952.1592785225@sss.pgh.pa.us
Run Code Online (Sandbox Code Playgroud)