使用 WITH 构造的奇怪优化效果

Question

使用 WITH 构造的奇怪优化效果

And*_*niy 7 postgresql performance optimization execution-plan amazon-rds query-performance

我有 4 个表，让我们将它们命名为：

表 A，15M 行
表 B，40K 行，
表 C，30K 行，
表 D，25M 行

（kk - 表示数百万）

我有一个遗留查询，它是这样构造的：

select C.<some_fields>,B.<some_fields>,D.<some_fields> from C
inner join A on C.x = A.x
inner join D on D.z = 123 and D.a_id = A.a_id
inner join B on C.x = B.x and B.z = 123
where A.type = 'Xxx'

Run Code Online (Sandbox Code Playgroud)

此查询非常慢，执行结果最多需要 3 分钟（对于特定情况，它返回 35k 行）。

但是当我将其更改为以下结构时：

with t as (
   select C.<some_fields>,D.<some_fields> from C
   inner join A on C.x = A.x
   inner join D on D.z = 123 and D.a_id = A.a_id
   where A.type = 'Xxx'
)
select t.*, B.<some_fields>,
inner join B on t.x = B.x and B.z = 123

Run Code Online (Sandbox Code Playgroud)

它开始工作的速度提高了 30 倍（即现在最多需要 6 秒来检索相同的结果）。

让我们假设，索引构造正确。当我注意到这个块，我已经with ( ... )非常快速地工作（并且它返回与整个查询非常相似的数据量）时，我做出这种技巧的想法就诞生了。

所以我的问题是：可能是什么原因？为什么 Postgres 不能在内部构建适当的计划或做同样的把戏？

更新：

旧查询的执行计划：

Nested Loop  (cost=1.83..1672.82 rows=1 width=54) (actual time=8.178..91515.625 rows=37373 loops=1)
  ->  Nested Loop  (cost=1.42..1671.47 rows=1 width=62) (actual time=8.108..90883.567 rows=37373 loops=1)
        Join Filter: (a.x = b.x)
        Rows Removed by Join Filter: 9132436
        ->  Index Scan using b_pkey on B b  (cost=0.41..8.43 rows=1 width=71) (actual time=0.022..0.782 rows=241 loops=1)
              Index Cond: (z = 123)
        ->  Nested Loop  (cost=1.00..1660.48 rows=146 width=149) (actual time=0.027..363.227 rows=38049 loops=241)
              ->  Index Only Scan using idx_1 on D d  (cost=0.56..424.59 rows=146 width=8) (actual time=0.017..50.869 rows=64176 loops=241)
                    Index Cond: (z = 123)
                    Heap Fetches: 15564503
              ->  Index Scan using a_pkey on A a  (cost=0.44..8.46 rows=1 width=149) (actual time=0.003..0.004 rows=1 loops=15466416)
                    Index Cond: (a_id = d.a_id)
  ->  Index Scan using c_pkey on C c (cost=0.41..1.08 rows=1 width=8) (actual time=0.005..0.007 rows=1 loops=37373)
        Index Cond: (x = a.x)
        Filter: ((type)::text = 'Xxx')
Planning time: 3.468 ms
Execution time: 91541.019 ms

Run Code Online (Sandbox Code Playgroud)

新查询的执行计划：

Hash Join  (cost=1828.09..1830.28 rows=1 width=94) (actual time=0.654..1130.542 rows=37376 loops=1)
  Hash Cond: (t.x = b.x)
  CTE t
    ->  Nested Loop  (cost=1.42..1819.64 rows=81 width=158) (actual time=0.060..761.058 rows=38052 loops=1)
          ->  Nested Loop  (cost=1.00..1660.48 rows=146 width=149) (actual time=0.039..461.235 rows=38052 loops=1)
                ->  Index Only Scan using idx_1 on D d  (cost=0.56..424.59 rows=146 width=8) (actual time=0.024..73.972 rows=64179 loops=1)
                      Index Cond: (z = 123)
                      Heap Fetches: 64586
                ->  Index Scan using a_pkey on A a  (cost=0.44..8.46 rows=1 width=149) (actual time=0.004..0.004 rows=1 loops=64179)
                      Index Cond: (a_id = d.a_id)
          ->  Index Scan using c_pkey on C c  (cost=0.41..1.07 rows=1 width=17) (actual time=0.004..0.005 rows=1 loops=38052)
                Index Cond: (x = a.x)
                Filter: ((type)::text = 'Xxx')
  ->  CTE Scan on t  (cost=0.00..1.62 rows=81 width=104) (actual time=0.063..854.405 rows=38052 loops=1)
  ->  Hash  (cost=8.43..8.43 rows=1 width=71) (actual time=0.353..0.353 rows=241 loops=1)
        Buckets: 1024  Batches: 1  Memory Usage: 34kB
        ->  Index Scan using b_pkey on B b  (cost=0.41..8.43 rows=1 width=71) (actual time=0.012..0.262 rows=241 loops=1)
              Index Cond: (z = 123)
Planning time: 1.221 ms
Execution time: 1147.267 ms

Run Code Online (Sandbox Code Playgroud)

更新-2：

最近亲爱的评论员注意到这个问题是由对行数的错误估计引起的，他们建议我这样做vacuum analyze。但是我正在启用Amazon-RDS该autovacuum功能的服务器上运行此服务器。此外，我尝试运行脚本以显示适用于真空的表，在Amazon RDS 文档中建议，它向我显示了0 个适用于真空的表。

UPDATE-3：ANALYZE评论中建议的执行并没有改变计划中的坏行估计，但提高了“旧”查询变体的速度。我仍然没有完全理解我的核心问题：为什么第二种类型的查询具有更高的速度（即使没有 ANALYZE）？

Answer 1

fil*_*rem 1

从解释中，我可以清楚地看到 PostgreSQL 产生了错误的估计，并为第一个查询选择了错误的计划。表 D 中 z 列的等式选择器有一个完全错误的估计（约 500 倍）。

正如克雷格指出的那样，第二次更好，因为WITH是一个规划围栏。

让我们集中讨论第一个查询。

由于统计数据缺失/旧/不充分，规划器会产生错误的估计。

几乎所有统计数据都可以在中看到pg_stats。您应该检查此视图并最好将相关行粘贴到此处。

SELECT * FROM pg_stats WHERE tablename='d' and attname='z';

Run Code Online (Sandbox Code Playgroud)

如果列有一些有趣的统计分布（不均匀、非高斯、倾斜、带有隐藏模式的伪随机等），那么 ANALYZE 后面的统计模块可能无法捕获规划器所需的规律性。

在许多情况下，使用更大的样本有助于产生真实的估计。有一个配置参数可以提高 ANALYZE 的样本大小，可以这样使用：

SET default_statistics_target TO 200;
ANALYZE A;
ANALYZE B;
ANALYZE C;
SET default_statistics_target TO 1000;
ANALYZE D;

Run Code Online (Sandbox Code Playgroud)

试验值并重试 SELECT 查询。如果有帮助，您可以使用永久增加样本大小ALTER TABLE D ALTER COLUMN Z SET STATISTICS 1000。

附言。与往常一样，您应该确保内存/资源配置正确。“资源配置”部分中的所有设置都应根据实际服务器资源进行调整（数据库和内存大小、RAID 阵列类型、SSD 驱动器）。

归档时间：	8 年，5 月前
查看次数：	166 次
最近记录：	8 年，3 月前