Yar*_*rin 5 postgresql performance query-performance
我有一个 Postgres 物化视图:
Column | Type | Modifiers
---------------------+-------------------+-----------
document_id | character varying |
recorded_date | date |
parcels | jsonb |
Indexes:
"index_my_view_on_document_id" btree (document_id)
"index_my_view_on_recorded_date" btree (recorded_date)
"index_my_view_on_parcels" gin (parcels)
Run Code Online (Sandbox Code Playgroud)
我正在尝试执行一个分页查询,该查询在parcelsjsonb 数组字段上进行过滤,但是每当我添加 LIMIT 时,我的性能就会下降:
无限制:
EXPLAIN ANALYZE SELECT document_id FROM my_view WHERE (parcels @> '[3022890014]') ORDER BY recorded_date DESC;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=24178.50..24194.79 rows=6518 width=21) (actual time=11.272..11.275 rows=22 loops=1)
Sort Key: recorded_date DESC
Sort Method: quicksort Memory: 26kB
-> Bitmap Heap Scan on my_view (cost=78.51..23765.58 rows=6518 width=21) (actual time=3.199..10.281 rows=22 loops=1)
Recheck Cond: (parcels @> '[3022890014]'::jsonb)
Heap Blocks: exact=12
-> Bitmap Index Scan on index_my_view_on_parcels (cost=0.00..76.88 rows=6518 width=0) (actual time=3.166..3.166 rows=22 loops=1)
Index Cond: (parcels @> '[3022890014]'::jsonb)
Planning time: 2.201 ms
Execution time: 11.395 ms
(10 rows)
Run Code Online (Sandbox Code Playgroud)
有限制:
EXPLAIN ANALYZE SELECT document_id FROM my_view WHERE (parcels @> '[3022890014]') ORDER BY recorded_date DESC LIMIT 25;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.43..2514.14 rows=25 width=21) (actual time=10471.981..17971.454 rows=22 loops=1)
-> Index Scan Backward using index_my_view_on_recorded_date on my_view (cost=0.43..655374.28 rows=6518 width=21) (actual time=10471.980..17971.446 rows=22 loops=1)
Filter: (parcels @> '[3022890014]'::jsonb)
Rows Removed by Filter: 6517780
Planning time: 0.164 ms
Execution time: 17972.229 ms
(6 rows)
Run Code Online (Sandbox Code Playgroud)
添加 LIMIT 会使查询速度降低 1000 倍!
我能够绕过这个问题做一个嵌套查询,如建议在这里:
EXPLAIN ANALYZE SELECT * FROM (SELECT document_id, recorded_date FROM my_view WHERE (parcels @> '[3022890014]') ORDER BY recorded_date DESC) subq ORDER BY recorded_date DESC LIMIT 25;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=24178.50..24178.81 rows=25 width=21) (actual time=2.180..2.183 rows=22 loops=1)
-> Sort (cost=24178.50..24194.79 rows=6518 width=21) (actual time=2.179..2.179 rows=22 loops=1)
Sort Key: my_view.recorded_date DESC
Sort Method: quicksort Memory: 26kB
-> Bitmap Heap Scan on my_view (cost=78.51..23765.58 rows=6518 width=21) (actual time=2.064..2.166 rows=22 loops=1)
Recheck Cond: (parcels @> '[3022890014]'::jsonb)
Heap Blocks: exact=12
-> Bitmap Index Scan on index_my_view_on_parcels (cost=0.00..76.88 rows=6518 width=0) (actual time=2.030..2.030 rows=22 loops=1)
Index Cond: (parcels @> '[3022890014]'::jsonb)
Planning time: 6.427 ms
Execution time: 2.230 ms
(11 rows)
Run Code Online (Sandbox Code Playgroud)
不过,我想了解为什么添加 LIMIT 会导致性能发生如此巨大的变化,以及是否有更好的方法来解决这个问题。
jja*_*nes 10
PostgreSQL 认为它会找到 6518 行满足您的条件。因此,当您告诉它在 25 处停止时,它会认为它宁愿扫描已经按顺序排列的行,并在找到按顺序排列的第 25 行(即表的 25/6518 或 0.4% 之后)后停止。但实际上只有 22 行满足要求,所以最终不得不扫描整个表,这是比想象中多 250 倍的工作。另一个计划,使用 gin 索引,最终比 PostgreSQL 认为的少 250 多倍,出于同样的原因——它认为它会找到并排序 6518 事物,而实际上它是 22 事物。
如果您使用更合适的数据结构,例如常规 PostgreSQL 数组而不是退化的 JSONB 对象,那么它会更准确地了解有多少行满足条件,并且可能会做出更好的选择。
| 归档时间: |
|
| 查看次数: |
5741 次 |
| 最近记录: |