如何在 postgres 中优化窗口查询

Jim*_*ath 7 postgresql performance index-tuning window-functions rank query-performance

我有下表,大约有 175k 条记录:

    Column     |            Type             |              Modifiers
----------------+-----------------------------+-------------------------------------
 id             | uuid                        | not null default uuid_generate_v4()
 competition_id | uuid                        | not null
 user_id        | uuid                        | not null
 first_name     | character varying(255)      | not null
 last_name      | character varying(255)      | not null
 image          | character varying(255)      |
 country        | character varying(255)      |
 slug           | character varying(255)      | not null
 total_votes    | integer                     | not null default 0
 created_at     | timestamp without time zone |
 updated_at     | timestamp without time zone |
 featured_until | timestamp without time zone |
 image_src      | character varying(255)      |
 hidden         | boolean                     | not null default false
 photos_count   | integer                     | not null default 0
 photo_id       | uuid                        |
Indexes:
    "entries_pkey" PRIMARY KEY, btree (id)
    "index_entries_on_competition_id" btree (competition_id)
    "index_entries_on_featured_until" btree (featured_until)
    "index_entries_on_hidden" btree (hidden)
    "index_entries_on_photo_id" btree (photo_id)
    "index_entries_on_slug" btree (slug)
    "index_entries_on_total_votes" btree (total_votes)
    "index_entries_on_user_id" btree (user_id)
Run Code Online (Sandbox Code Playgroud)

我正在执行以下查询以获取条目的排名以及下一个和上一个条目的 slug:

WITH entry_with_global_rank AS ( 
  SELECT id
       , rank() OVER w AS global_rank
       , LAG(slug) OVER w AS previous_slug
       , LEAD(slug) OVER w AS next_slug
  FROM entries 
  WHERE competition_id = 'bdd94eee-25a4-481f-b7b5-37aaed953c6b' 
  WINDOW w AS (PARTITION BY competition_id ORDER BY total_votes DESC) 
) 
SELECT * 
FROM entry_with_global_rank 
WHERE id = 'f2df68b7-d720-459d-8c4d-d11e28e0f0c0' 
LIMIT 1;
Run Code Online (Sandbox Code Playgroud)

以下是结果EXPLAIN

                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
 Limit  (cost=516228.88..516233.37 rows=1 width=88)
   CTE entry_with_global_rank
     ->  WindowAgg  (cost=510596.59..516228.88 rows=250324 width=52)
           ->  Sort  (cost=510596.59..511222.40 rows=250324 width=52)
                 Sort Key: entries.total_votes
                 ->  Seq Scan on entries  (cost=0.00..488150.74 rows=250324 width=52)
                       Filter: (competition_id = 'bdd94eee-25a4-481f-b7b5-37aaed953c6b'::uuid)
   ->  CTE Scan on entry_with_global_rank  (cost=0.00..5632.29 rows=1252 width=88)
         Filter: (id = 'f2df68b7-d720-459d-8c4d-d11e28e0f0c0'::uuid)
(9 rows)
Run Code Online (Sandbox Code Playgroud)

这个查询大约需要 1400 毫秒;有没有办法加快这个速度?

编辑:

以下是结果EXPLAIN ANALYZE

                                                               QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=516228.88..516233.37 rows=1 width=88) (actual time=1232.824..1232.824 rows=1 loops=1)
   CTE entry_with_global_rank
     ->  WindowAgg  (cost=510596.59..516228.88 rows=250324 width=52) (actual time=1202.101..1226.846 rows=8727 loops=1)
           ->  Sort  (cost=510596.59..511222.40 rows=250324 width=52) (actual time=1202.069..1213.992 rows=8728 loops=1)
                 Sort Key: entries.total_votes
                 Sort Method: quicksort  Memory: 8128kB
                 ->  Seq Scan on entries  (cost=0.00..488150.74 rows=250324 width=52) (actual time=89.970..1174.083 rows=50335 loops=1)
                       Filter: (competition_id = 'bdd94eee-25a4-481f-b7b5-37aaed953c6b'::uuid)
                       Rows Removed by Filter: 125477
   ->  CTE Scan on entry_with_global_rank  (cost=0.00..5632.29 rows=1252 width=88) (actual time=1232.822..1232.822 rows=1 loops=1)
         Filter: (id = 'f2df68b7-d720-459d-8c4d-d11e28e0f0c0'::uuid)
         Rows Removed by Filter: 8726
 Total runtime: 1234.424 ms
(13 rows)
Run Code Online (Sandbox Code Playgroud)

编辑2:

VACUUM ANALYZE在数据库上运行,现在查询时间有所改善,但我确信一定有一些方法可以提高性能:

                                                                                QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=475372.26..475376.76 rows=1 width=88) (actual time=138.388..138.388 rows=1 loops=1)
   CTE entry_with_global_rank
     ->  WindowAgg  (cost=470662.23..475372.26 rows=209335 width=35) (actual time=125.489..132.214 rows=4178 loops=1)
           ->  Sort  (cost=470662.23..471185.56 rows=209335 width=35) (actual time=125.462..126.724 rows=4179 loops=1)
                 Sort Key: entries.total_votes
                 Sort Method: quicksort  Memory: 5510kB
                 ->  Bitmap Heap Scan on entries  (cost=71390.90..452161.77 rows=209335 width=35) (actual time=29.381..87.130 rows=50390 loops=1)
                       Recheck Cond: (competition_id = 'bdd94eee-25a4-481f-b7b5-37aaed953c6b'::uuid)
                       ->  Bitmap Index Scan on index_entries_on_competition_id  (cost=0.00..71338.56 rows=209335 width=0) (actual time=23.593..23.593 rows=51257 loops=1)
                             Index Cond: (competition_id = 'bdd94eee-25a4-481f-b7b5-37aaed953c6b'::uuid)
   ->  CTE Scan on entry_with_global_rank  (cost=0.00..4710.04 rows=1047 width=88) (actual time=138.387..138.387 rows=1 loops=1)
         Filter: (id = '9470ec4f-fed1-4f95-bbed-1e3dbba5f53b'::uuid)
         Rows Removed by Filter: 4177
 Total runtime: 138.588 ms
(14 rows)
Run Code Online (Sandbox Code Playgroud)

编辑3:

根据要求,带有覆盖索引的最终查询计划就在 a 之后VACUUM ANALYZE

                                                                              QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.42..6771.99 rows=1 width=88) (actual time=46.765..46.765 rows=1 loops=1)
   ->  Subquery Scan on entry_with_global_rank  (cost=0.42..6771.99 rows=1 width=88) (actual time=46.763..46.763 rows=1 loops=1)
         Filter: (entry_with_global_rank.id = 'f2df68b7-d720-459d-8c4d-d11e28e0f0c0'::uuid)
         Rows Removed by Filter: 9128
         ->  WindowAgg  (cost=0.42..5635.06 rows=90955 width=35) (actual time=0.090..40.002 rows=9129 loops=1)
               ->  Index Only Scan using entries_extra_special_idx on entries  (cost=0.42..3815.96 rows=90955 width=35) (actual time=0.071..10.973 rows=9130 loops=1)
                     Index Cond: (competition_id = 'bdd94eee-25a4-481f-b7b5-37aaed953c6b'::uuid)
                     Heap Fetches: 166
 Total runtime: 46.867 ms
(9 rows)
Run Code Online (Sandbox Code Playgroud)

Erw*_*ter 7

CTE在这里不需要和姿势作为优化屏障。普通子查询通常表现更好:

SELECT * 
FROM  (
   SELECT id
         ,rank()     OVER w AS global_rank
         ,lag(slug)  OVER w AS previous_slug
         ,lead(slug) OVER w AS next_slug 
   FROM   entries 
   WHERE  competition_id = 'bdd94eee-25a4-481f-b7b5-37aaed953c6b' 
   WINDOW w AS (ORDER BY total_votes DESC) 
   ) entry_with_global_rank 
WHERE  id = 'f2df68b7-d720-459d-8c4d-d11e28e0f0c0' 
LIMIT  1;
Run Code Online (Sandbox Code Playgroud)

正如@Daniel 评论的那样,我PARTITION BY从窗口定义中删除了该子句,因为competition_id无论如何您都限制为一个。

表格布局

你可以优化你的表布局以稍微减少磁盘存储大小,这会使一切都快一点,但是:

     Column     |            Type             |              Modifiers
----------------+-----------------------------+-------------------------------------
 id             | uuid                        | not null default uuid_generate_v4()
 competition_id | uuid                        | not null
 user_id        | uuid                        | not null
 total_votes    | integer                     | not null default 0
 photos_count   | integer                     | not null default 0
 hidden         | boolean                     | not null default false
 slug           | character varying(255)      | not null
 first_name     | character varying(255)      | not null
 last_name      | character varying(255)      | not null
 image          | character varying(255)      |
 country        | character varying(255)      |
 image_src      | character varying(255)      |
 photo_id       | uuid                        |
 created_at     | timestamp without time zone |
 updated_at     | timestamp without time zone |
 featured_until | timestamp without time zone |
Run Code Online (Sandbox Code Playgroud)

更多相关信息:

另外,你真的需要所有这些uuid列吗?int或者bigint不适合你?会使表和索引更小,一切都更快。

我只会text用于字符数据,但这不会有助于查询的性能。

旁白:character varying(255)在 Postgres 中几乎总是毫无意义。其他一些 RDBMS 受益于长度限制,对于 Postgres 来说都是一样的(除非您实际上需要强制执行不太可能的 255 个字符的最大长度)。

特殊索引

最后,您可以构建一个高度专业化的索引(仅当索引维护值得特殊外壳时):

CREATE INDEX entries_special_idx ON entries (competition_id, total_votes DESC, id, slug);
Run Code Online (Sandbox Code Playgroud)

(id, slug)如果您可以从中获得仅索引扫描,则添加到索引才有意义。(禁用 autovacuum 或大量并发写入会否定这种努力。)否则删除最后两列。

在此期间,请审核您的索引。他们都在使用吗?这里可能有一些死货。