ORDER BY 子句会降低查询性能

Ima*_* Y. 5 postgresql performance order-by postgresql-10 postgresql-performance

语境:

PostgreSQL 10,users表有3667438条记录,users表有一个名为social的JSONB,我们通常使用对计算函数输出进行索引的策略,这样我们就可以将信息聚合到一个单独的索引中。的输出engagement(social)函数是双精度数字类型。

问题:

有问题的条款是 ORDER BY engagement(social) DESC NULLS LAST,还有一个 btree 索引idx_in_social_engagement with DESC NULLS LAST附加到这个数据。

快速查询:

EXPLAIN ANALYZE
SELECT  "users".* FROM "users"
WHERE (follower_count(social) < 500000)
AND (engagement(social) > 0.03)
AND (engagement(social) < 0.25)
AND (peemv(social) < 533)
ORDER BY "users"."created_at" ASC
LIMIT 12 OFFSET 0;

Limit  (cost=0.43..52.25 rows=12 width=1333) (actual time=0.113..1.625 
rows=12 loops=1)
   ->  Index Scan using created_at_idx on users  (cost=0.43..7027711.55 rows=1627352 width=1333) (actual time=0.112..1.623 rows=12 loops=1)
         Filter: ((follower_count(social) < 500000) AND (engagement(social) > '0.03'::double precision) AND (engagement(social) <  '0.25'::double precision) AND (peemv(social) > '0'::double precision) AND (peemv(social) < '533'::double precision))
         Rows Removed by Filter: 8
 Planning time: 0.324 ms
 Execution time: 1.639 ms
Run Code Online (Sandbox Code Playgroud)

慢查询:

EXPLAIN ANALYZE 
SELECT  "users".* FROM "users" 
WHERE (follower_count(social) < 500000) 
AND (engagement(social) > 0.03) 
AND (engagement(social) < 0.25) 
AND (peemv(social) > 0.0) 
AND (peemv(social) < 533) 
ORDER BY engagement(social) DESC NULLS LAST, "users"."created_at" ASC 
LIMIT 12 OFFSET 0;

Limit  (cost=2884438.00..2884438.03 rows=12 width=1341) (actual time=68011.728..68011.730 rows=12 loops=1)
->  Sort  (cost=2884438.00..2888506.38 rows=1627352 width=1341) (actual time=68011.727..68011.728 rows=12 loops=1)
        Sort Key: (engagement(social)) DESC NULLS LAST, created_at
        Sort Method: top-N heapsort  Memory: 45kB
        ->  Index Scan using idx_in_social_engagement on users  (cost=0.43..2847131.26 rows=1627352 width=1341) (actual time=0.082..67019.102 rows=1360633 loops=1)
            Index Cond: ((engagement(social) > '0.03'::double precision) AND (engagement(social) < '0.25'::double precision))
            Filter: ((follower_count(social) < 500000) AND (peemv(social) > '0'::double precision) AND (peemv(social) < '533'::double precision))
            Rows Removed by Filter: 85580
Planning time: 0.312 ms
Execution time: 68011.752 ms
Run Code Online (Sandbox Code Playgroud)

选择带有 * 因为我需要存储在每一行中的所有数据。

更新:

CREATE INDEX idx_in_social_engagement on influencers USING BTREE ( engagement(social) DESC NULLS LAST)
Run Code Online (Sandbox Code Playgroud)

精确的索引定义

jja*_*nes 7

你的ORDER BY条款是:

engagement(social) DESC NULLS LAST, "users"."created_at" ASC
Run Code Online (Sandbox Code Playgroud)

但我怀疑你的索引只是在:

engagement(social) DESC NULLS LAST
Run Code Online (Sandbox Code Playgroud)

所以索引不能完全支持 ORDER BY.

您可以在不使用JSONB或 表达式索引的情况下重现相同的问题。您可以通过在您的列上创建复合表达式索引来挽救这种情况ORDER BY

如果 PostgreSQL 规划器是无限明智的,它可能能够有效地使用现有索引。它必须继续前进,engagement(social) DESC NULLS LAST直到它收集到 12 个满足所有其余过滤器要求的元组。然后它会继续向上移动该索引,直到它收集到engagement(social)与第 12 个元组相关的所有其余元组(并且满足其他标准)。然后它必须在 full 上重新排序所有收集的元组ORDER BY,并将 应用于LIMIT 12扩展和重新排序的集合。但是 PostgreSQL 规划器并不是无限明智的。