未优化单行的 Postgres CTE 查询

Question

未优化单行的 Postgres CTE 查询

我试图从我的数据库中获取用户统计信息的加权总和，并且我一次只会查询一个或两个用户的表，所以我将它写为一个视图。

由于它是一个视图，我假装计算表中每一行的总和，然后我希望优化器能够意识到当我只要求一行并优化查询时。然而，我的查询计划非常庞大，并且在其最里面计算了 170 亿行，我认为最多应该有 1000 行。

这是查询：

CREATE OR REPLACE VIEW weighted_stats AS
WITH 
    clf AS (SELECT * FROM classifiers order by time_trained desc limit 1),
    weights AS (SELECT kv.key, kv.value from clf, each(clf.weights) AS kv),
    kvs AS (
        SELECT stats.player_id, kv.key, kv.value FROM
        stats, each(stats.hstore_column) AS kv),
SELECT
    stats.player_id,
    SUM(kvs.value :: numeric * weights.value :: numeric) AS stats
FROM
    kvs JOIN weights USING (key)
GROUP BY kvs.player_id;

Run Code Online (Sandbox Code Playgroud)

这是查询计划：

explain analyze select * from weighted_stats where player_id=76561197960269296

GroupAggregate  (cost=53645.35..299471.72 rows=1 width=72) (actual time=1014.016..1014.016 rows=0 loops=1)
   Group Key: kvs.id
   CTE clf
     ->  Limit  (cost=20.65..20.65 rows=1 width=84) (actual time=0.017..0.018 rows=1 loops=1)
           ->  Sort  (cost=20.65..22.43 rows=710 width=84) (actual time=0.014..0.014 rows=1 loops=1)
                 Sort Key: classifiers.time_trained
                 Sort Method: quicksort  Memory: 25kB
                 ->  Seq Scan on classifiers  (cost=0.00..17.10 rows=710 width=84) (actual time=0.003..0.005 rows=1 loops=1)
   CTE kvs
     ->  Seq Scan on stats  (cost=0.00..53572.18 rows=10318000 width=722) (actual time=0.037..530.337 rows=336036 loops=1)
   CTE weights
     ->  Nested Loop  (cost=0.00..20.02 rows=1000 width=64) (actual time=0.036..0.046 rows=2 loops=1)
           ->  CTE Scan on clf  (cost=0.00..0.02 rows=1 width=32) (actual time=0.020..0.023 rows=1 loops=1)
           ->  Function Scan on each kv  (cost=0.00..10.00 rows=1000 width=64) (actual time=0.011..0.013 rows=2 loops=1)
   ->  Hash Join  (cost=32.50..241344.73 rows=257950 width=72) (actual time=1014.012..1014.012 rows=0 loops=1)
         Hash Cond: (kvs.key = weights.key)
         ->  CTE Scan on kvs  (cost=0.00..232155.00 rows=51590 width=72) (actual time=0.044..1013.877 rows=62 loops=1)
               Filter: (id = 76561197960269296::bigint)
               Rows Removed by Filter: 335974
         ->  Hash  (cost=20.00..20.00 rows=1000 width=64) (actual time=0.060..0.060 rows=2 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 1kB
               ->  CTE Scan on weights  (cost=0.00..20.00 rows=1000 width=64) (actual time=0.040..0.054 rows=2 loops=1)
 Planning time: 0.286 ms
 Execution time: 1017.671 ms

Run Code Online (Sandbox Code Playgroud)

这仍然比我预期的要慢得多。优化部分是通过在加入之前过滤而不是在分组之前过滤来工作的，但似乎 kvs CTE（它本身应该被过滤）仍在为每个人计算。

Answer 1

chi*_*rlu 6

PostgreSQL 将公共表表达式视为“优化栅栏”：它永远不会将谓词从主查询下推到 CTE，也不会折叠任何跨 CTE 边界的连接。相反，它通常会按原样评估整个 CTE，实现结果；然后主查询将访问从 CTE 生成的临时表。

所以是的，您的查询可能会从将 CTE 转换为子查询中受益。

需要注意的是一个实际的视图（由CREATE VIEW创建）并不能起到优化栅栏。视图的定义将包含在使用它的查询中，然后像往常一样优化。对于 CTE，已经讨论过将优化栅栏行为设为可选，以便它们可以“仅”用于使查询更具可读性。但是，从 9.5 版开始，这还没有实现。

归档时间：	9 年，9 月前
查看次数：	1256 次
最近记录：	9 年，9 月前