Jim*_*ath 7 postgresql performance index-tuning window-functions rank query-performance
我有下表,大约有 175k 条记录:
Column | Type | Modifiers
----------------+-----------------------------+-------------------------------------
id | uuid | not null default uuid_generate_v4()
competition_id | uuid | not null
user_id | uuid | not null
first_name | character varying(255) | not null
last_name | character varying(255) | not null
image | character varying(255) |
country | character varying(255) |
slug | character varying(255) | not null
total_votes | integer | not null default 0
created_at | timestamp without time zone |
updated_at | timestamp without time zone |
featured_until | timestamp without time zone |
image_src | character varying(255) |
hidden | boolean | not null default false
photos_count | integer | not null default 0
photo_id | uuid |
Indexes:
"entries_pkey" PRIMARY KEY, btree (id)
"index_entries_on_competition_id" btree (competition_id)
"index_entries_on_featured_until" btree (featured_until)
"index_entries_on_hidden" btree (hidden)
"index_entries_on_photo_id" btree (photo_id)
"index_entries_on_slug" btree (slug)
"index_entries_on_total_votes" btree (total_votes)
"index_entries_on_user_id" btree (user_id)
Run Code Online (Sandbox Code Playgroud)
我正在执行以下查询以获取条目的排名以及下一个和上一个条目的 slug:
WITH entry_with_global_rank AS (
SELECT id
, rank() OVER w AS global_rank
, LAG(slug) OVER w AS previous_slug
, LEAD(slug) OVER w AS next_slug
FROM entries
WHERE competition_id = 'bdd94eee-25a4-481f-b7b5-37aaed953c6b'
WINDOW w AS (PARTITION BY competition_id ORDER BY total_votes DESC)
)
SELECT *
FROM entry_with_global_rank
WHERE id = 'f2df68b7-d720-459d-8c4d-d11e28e0f0c0'
LIMIT 1;
Run Code Online (Sandbox Code Playgroud)
以下是结果EXPLAIN
:
QUERY PLAN
-----------------------------------------------------------------------------------------------
Limit (cost=516228.88..516233.37 rows=1 width=88)
CTE entry_with_global_rank
-> WindowAgg (cost=510596.59..516228.88 rows=250324 width=52)
-> Sort (cost=510596.59..511222.40 rows=250324 width=52)
Sort Key: entries.total_votes
-> Seq Scan on entries (cost=0.00..488150.74 rows=250324 width=52)
Filter: (competition_id = 'bdd94eee-25a4-481f-b7b5-37aaed953c6b'::uuid)
-> CTE Scan on entry_with_global_rank (cost=0.00..5632.29 rows=1252 width=88)
Filter: (id = 'f2df68b7-d720-459d-8c4d-d11e28e0f0c0'::uuid)
(9 rows)
Run Code Online (Sandbox Code Playgroud)
这个查询大约需要 1400 毫秒;有没有办法加快这个速度?
编辑:
以下是结果EXPLAIN ANALYZE
:
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=516228.88..516233.37 rows=1 width=88) (actual time=1232.824..1232.824 rows=1 loops=1)
CTE entry_with_global_rank
-> WindowAgg (cost=510596.59..516228.88 rows=250324 width=52) (actual time=1202.101..1226.846 rows=8727 loops=1)
-> Sort (cost=510596.59..511222.40 rows=250324 width=52) (actual time=1202.069..1213.992 rows=8728 loops=1)
Sort Key: entries.total_votes
Sort Method: quicksort Memory: 8128kB
-> Seq Scan on entries (cost=0.00..488150.74 rows=250324 width=52) (actual time=89.970..1174.083 rows=50335 loops=1)
Filter: (competition_id = 'bdd94eee-25a4-481f-b7b5-37aaed953c6b'::uuid)
Rows Removed by Filter: 125477
-> CTE Scan on entry_with_global_rank (cost=0.00..5632.29 rows=1252 width=88) (actual time=1232.822..1232.822 rows=1 loops=1)
Filter: (id = 'f2df68b7-d720-459d-8c4d-d11e28e0f0c0'::uuid)
Rows Removed by Filter: 8726
Total runtime: 1234.424 ms
(13 rows)
Run Code Online (Sandbox Code Playgroud)
编辑2:
我VACUUM ANALYZE
在数据库上运行,现在查询时间有所改善,但我确信一定有一些方法可以提高性能:
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=475372.26..475376.76 rows=1 width=88) (actual time=138.388..138.388 rows=1 loops=1)
CTE entry_with_global_rank
-> WindowAgg (cost=470662.23..475372.26 rows=209335 width=35) (actual time=125.489..132.214 rows=4178 loops=1)
-> Sort (cost=470662.23..471185.56 rows=209335 width=35) (actual time=125.462..126.724 rows=4179 loops=1)
Sort Key: entries.total_votes
Sort Method: quicksort Memory: 5510kB
-> Bitmap Heap Scan on entries (cost=71390.90..452161.77 rows=209335 width=35) (actual time=29.381..87.130 rows=50390 loops=1)
Recheck Cond: (competition_id = 'bdd94eee-25a4-481f-b7b5-37aaed953c6b'::uuid)
-> Bitmap Index Scan on index_entries_on_competition_id (cost=0.00..71338.56 rows=209335 width=0) (actual time=23.593..23.593 rows=51257 loops=1)
Index Cond: (competition_id = 'bdd94eee-25a4-481f-b7b5-37aaed953c6b'::uuid)
-> CTE Scan on entry_with_global_rank (cost=0.00..4710.04 rows=1047 width=88) (actual time=138.387..138.387 rows=1 loops=1)
Filter: (id = '9470ec4f-fed1-4f95-bbed-1e3dbba5f53b'::uuid)
Rows Removed by Filter: 4177
Total runtime: 138.588 ms
(14 rows)
Run Code Online (Sandbox Code Playgroud)
编辑3:
根据要求,带有覆盖索引的最终查询计划就在 a 之后VACUUM ANALYZE
:
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.42..6771.99 rows=1 width=88) (actual time=46.765..46.765 rows=1 loops=1)
-> Subquery Scan on entry_with_global_rank (cost=0.42..6771.99 rows=1 width=88) (actual time=46.763..46.763 rows=1 loops=1)
Filter: (entry_with_global_rank.id = 'f2df68b7-d720-459d-8c4d-d11e28e0f0c0'::uuid)
Rows Removed by Filter: 9128
-> WindowAgg (cost=0.42..5635.06 rows=90955 width=35) (actual time=0.090..40.002 rows=9129 loops=1)
-> Index Only Scan using entries_extra_special_idx on entries (cost=0.42..3815.96 rows=90955 width=35) (actual time=0.071..10.973 rows=9130 loops=1)
Index Cond: (competition_id = 'bdd94eee-25a4-481f-b7b5-37aaed953c6b'::uuid)
Heap Fetches: 166
Total runtime: 46.867 ms
(9 rows)
Run Code Online (Sandbox Code Playgroud)
该CTE在这里不需要和姿势作为优化屏障。普通子查询通常表现更好:
SELECT *
FROM (
SELECT id
,rank() OVER w AS global_rank
,lag(slug) OVER w AS previous_slug
,lead(slug) OVER w AS next_slug
FROM entries
WHERE competition_id = 'bdd94eee-25a4-481f-b7b5-37aaed953c6b'
WINDOW w AS (ORDER BY total_votes DESC)
) entry_with_global_rank
WHERE id = 'f2df68b7-d720-459d-8c4d-d11e28e0f0c0'
LIMIT 1;
Run Code Online (Sandbox Code Playgroud)
正如@Daniel 评论的那样,我PARTITION BY
从窗口定义中删除了该子句,因为competition_id
无论如何您都限制为一个。
你可以优化你的表布局以稍微减少磁盘存储大小,这会使一切都快一点,但是:
Column | Type | Modifiers
----------------+-----------------------------+-------------------------------------
id | uuid | not null default uuid_generate_v4()
competition_id | uuid | not null
user_id | uuid | not null
total_votes | integer | not null default 0
photos_count | integer | not null default 0
hidden | boolean | not null default false
slug | character varying(255) | not null
first_name | character varying(255) | not null
last_name | character varying(255) | not null
image | character varying(255) |
country | character varying(255) |
image_src | character varying(255) |
photo_id | uuid |
created_at | timestamp without time zone |
updated_at | timestamp without time zone |
featured_until | timestamp without time zone |
Run Code Online (Sandbox Code Playgroud)
更多相关信息:
另外,你真的需要所有这些uuid
列吗?int
或者bigint
不适合你?会使表和索引更小,一切都更快。
我只会text
用于字符数据,但这不会有助于查询的性能。
旁白:character varying(255)
在 Postgres 中几乎总是毫无意义。其他一些 RDBMS 受益于长度限制,对于 Postgres 来说都是一样的(除非您实际上需要强制执行不太可能的 255 个字符的最大长度)。
最后,您可以构建一个高度专业化的索引(仅当索引维护值得特殊外壳时):
CREATE INDEX entries_special_idx ON entries (competition_id, total_votes DESC, id, slug);
Run Code Online (Sandbox Code Playgroud)
(id, slug)
如果您可以从中获得仅索引扫描,则添加到索引才有意义。(禁用 autovacuum 或大量并发写入会否定这种努力。)否则删除最后两列。
在此期间,请审核您的索引。他们都在使用吗?这里可能有一些死货。