hap*_*set 13 performance postgresql-9.3 slow-log postgresql-performance
我有items
以下架构的表(在 postgres v9.3.5 中):
Column | Type | Modifiers | Storage
-----------+--------+----------------------------------------------------+----------
id | bigint | not null default nextval('items_id_seq'::regclass) | plain
data | text | not null | extended
object_id | bigint | not null | plain
Indexes:
"items_pkey" PRIMARY KEY, btree (id)
"items_object_id_idx" btree (object_id)
Has OIDs: no
Run Code Online (Sandbox Code Playgroud)
当我执行查询时,它会挂起很长时间:
SELECT * FROM "items" WHERE "object_id" = '123' ORDER BY "id" DESC LIMIT 1;
Run Code Online (Sandbox Code Playgroud)
在 VACUUM ANALYZE 之后,查询执行改进了很多,但仍然不完美。
# EXPLAIN ANALYZE SELECT * FROM "items" WHERE "object_id" = '123' ORDER BY "id" DESC LIMIT 1;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.44..1269.14 rows=1 width=63) (actual time=873796.061..873796.061 rows=0 loops=1)
-> Index Scan Backward using items_pkey on items (cost=0.44..1164670.11 rows=918 width=63) (actual time=873796.059..873796.059 rows=0 loops=1)
Filter: (object_id = 123::bigint)
Rows Removed by Filter: 27942522
Total runtime: 873796.113 ms
(5 rows)
Run Code Online (Sandbox Code Playgroud)
奇怪的是,当我执行
SELECT * FROM "items" WHERE "object_id" = '123' LIMIT 1;
Run Code Online (Sandbox Code Playgroud)
它返回 0 行,我可以在我的代码中执行它来优化我的 Web 应用程序的性能,但为什么它可以由 Postgres 本身完成?我从 MySQL 来到 Postgres,从未在那里看到过如此奇怪的事情。
======
我发现它使用不同的查询计划,不同的索引,但为什么呢?
# EXPLAIN ANALYZE SELECT * FROM "items" WHERE "object_id" = '123' LIMIT 1;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.56..3.34 rows=1 width=63) (actual time=0.014..0.014 rows=0 loops=1)
-> Index Scan using items_object_id_operation_idx on items (cost=0.56..2579.16 rows=929 width=63) (actual time=0.013..0.013 rows=0 loops=1)
Index Cond: (object_id = 123::bigint)
Total runtime: 0.029 ms
(4 rows)
Run Code Online (Sandbox Code Playgroud)
mus*_*cio 15
试图解释为什么两个查询之间存在性能差异。
This one:与匹配SELECT * FROM "items" WHERE "object_id" = '123' LIMIT 1
的任何一行都满足object_id
,因此索引 onobject_id
是一个自然的选择。该查询需要最少的 I/O:索引扫描以查找第一个匹配值,再加上一次堆读取以获取整行。
替代方案:SELECT * FROM "items" WHERE "object_id" = '123' ORDER BY "id" DESC LIMIT 1
要求所有匹配的行object_id
按另一列排序id
,然后id
返回最大值为 的行。如果要使用索引,则object_id
需要执行以下操作:扫描索引以找到每个匹配项object_id
;对于每场比赛去获取实际的行;然后对所有获取的行进行排序,id
并返回最大的行id
。
优化器选择的替代方案,大概是基于object_id
直方图,是:id
向后扫描索引,整体;对于每个值去取行并检查值是否object_id
匹配;返回第一个匹配行,它将具有最大可能id
值。这种替代方法避免了对行进行排序,所以我猜优化器更喜欢使用 上的索引object_id
。
索引的存在(object_id asc, id desc)
允许另一种选择:扫描此新索引以查找与提供的object_id
值匹配的第一个条目,根据定义,该条目将具有最高id
值;去获取一个匹配的行并返回。显然,这是最有效的方法。
优化查询
SELECT * FROM "items" WHERE "object_id" = '123' ORDER BY "id" DESC LIMIT 1;
Run Code Online (Sandbox Code Playgroud)
我做了以下
SELECT * FROM
(SELECT * FROM "items"
WHERE "object_id" = '123'
ORDER BY "id" DESC) AS "items"
ORDER BY "id" DESC LIMIT 1;
Run Code Online (Sandbox Code Playgroud)
它在不添加(object_id asc, id desc)
@mustaccio 建议的索引的情况下有所帮助。
# EXPLAIN SELECT * FROM
(SELECT * FROM "items"
WHERE "object_id" = '123'
ORDER BY "id" DESC) AS "items"
ORDER BY "id" DESC LIMIT 1;
QUERY PLAN
--------------------------------------------------------------------------------------------------------
Limit (cost=16629.84..16629.86 rows=1 width=59)
-> Sort (cost=16629.84..16640.44 rows=4239 width=59)
Sort Key: items.id
-> Bitmap Heap Scan on items (cost=125.42..16374.45 rows=4239 width=59)
Recheck Cond: (object_id = 123::bigint)
-> Bitmap Index Scan on items_object_id_idx (cost=0.00..124.36 rows=4239 width=0)
Index Cond: (object_id = 123::bigint)
(7 rows)
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
17049 次 |
最近记录: |