postgres 在 ORDER BY "id" DESC LIMIT 1 上表现不佳

hap*_*set 13 performance postgresql-9.3 slow-log postgresql-performance

我有items以下架构的表(在 postgres v9.3.5 中):

  Column   | Type   |                         Modifiers                  | Storage  
-----------+--------+----------------------------------------------------+----------
 id        | bigint | not null default nextval('items_id_seq'::regclass) | plain    
 data      | text   | not null                                           | extended 
 object_id | bigint | not null                                           | plain    
Indexes:
    "items_pkey" PRIMARY KEY, btree (id)
    "items_object_id_idx" btree (object_id)
Has OIDs: no
Run Code Online (Sandbox Code Playgroud)

当我执行查询时,它会挂起很长时间:

SELECT * FROM "items" WHERE "object_id" = '123' ORDER BY "id" DESC LIMIT 1;
Run Code Online (Sandbox Code Playgroud)

在 VACUUM ANALYZE 之后,查询执行改进了很多,但仍然不完美。

# EXPLAIN ANALYZE SELECT * FROM "items" WHERE "object_id" = '123' ORDER BY "id" DESC LIMIT 1;
                                                                            QUERY PLAN                                  
------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.44..1269.14 rows=1 width=63) (actual time=873796.061..873796.061 rows=0 loops=1)
   ->  Index Scan Backward using items_pkey on items  (cost=0.44..1164670.11 rows=918 width=63) (actual time=873796.059..873796.059 rows=0 loops=1)
         Filter: (object_id = 123::bigint)
         Rows Removed by Filter: 27942522
 Total runtime: 873796.113 ms
(5 rows)
Run Code Online (Sandbox Code Playgroud)

奇怪的是,当我执行

SELECT * FROM "items" WHERE "object_id" = '123' LIMIT 1;
Run Code Online (Sandbox Code Playgroud)

它返回 0 行,我可以在我的代码中执行它来优化我的 Web 应用程序的性能,但为什么它可以由 Postgres 本身完成?我从 MySQL 来到 Postgres,从未在那里看到过如此奇怪的事情。

======

我发现它使用不同的查询计划,不同的索引,但为什么呢?

# EXPLAIN ANALYZE SELECT * FROM "items" WHERE "object_id" = '123' LIMIT 1;
                                                                          QUERY PLAN                                    
--------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.56..3.34 rows=1 width=63) (actual time=0.014..0.014 rows=0 loops=1)
   ->  Index Scan using items_object_id_operation_idx on items  (cost=0.56..2579.16 rows=929 width=63) (actual time=0.013..0.013 rows=0 loops=1)
         Index Cond: (object_id = 123::bigint)
 Total runtime: 0.029 ms
(4 rows)
Run Code Online (Sandbox Code Playgroud)

mus*_*cio 15

试图解释为什么两个查询之间存在性能差异。

This one:与匹配SELECT * FROM "items" WHERE "object_id" = '123' LIMIT 1任何一行都满足object_id,因此索引 onobject_id是一个自然的选择。该查询需要最少的 I/O:索引扫描以查找第一个匹配值,再加上一次堆读取以获取整行。

替代方案:SELECT * FROM "items" WHERE "object_id" = '123' ORDER BY "id" DESC LIMIT 1要求所有匹配的行object_id按另一列排序id,然后id返回最大值为 的行。如果要使用索引,则object_id需要执行以下操作:扫描索引以找到每个匹配项object_id;对于每场比赛去获取实际的行;然后对所有获取的行进行排序,id并返回最大的行id

优化器选择的替代方案,大概是基于object_id直方图,是:id向后扫描索引,整体;对于每个值去取行并检查值是否object_id匹配;返回第一个匹配行,它将具有最大可能id值。这种替代方法避免了对行进行排序,所以我猜优化器更喜欢使用 上的索引object_id

索引的存在(object_id asc, id desc)允许另一种选择:扫描此新索引以查找与提供的object_id值匹配的第一个条目,根据定义,该条目将具有最高id值;去获取一个匹配的行并返回。显然,这是最有效的方法。


hap*_*set 6

优化查询

SELECT * FROM "items" WHERE "object_id" = '123' ORDER BY "id" DESC LIMIT 1;
Run Code Online (Sandbox Code Playgroud)

我做了以下

SELECT * FROM 
    (SELECT * FROM "items" 
     WHERE "object_id" = '123'
     ORDER BY "id" DESC) AS "items" 
ORDER BY "id" DESC LIMIT 1;
Run Code Online (Sandbox Code Playgroud)

它在不添加(object_id asc, id desc)@mustaccio 建议的索引的情况下有所帮助。

# EXPLAIN SELECT * FROM 
    (SELECT * FROM "items" 
     WHERE "object_id" = '123'
     ORDER BY "id" DESC) AS "items" 
ORDER BY "id" DESC LIMIT 1;
                                               QUERY PLAN
--------------------------------------------------------------------------------------------------------
 Limit  (cost=16629.84..16629.86 rows=1 width=59)
   ->  Sort  (cost=16629.84..16640.44 rows=4239 width=59)
         Sort Key: items.id
         ->  Bitmap Heap Scan on items  (cost=125.42..16374.45 rows=4239 width=59)
               Recheck Cond: (object_id = 123::bigint)
                   ->  Bitmap Index Scan on items_object_id_idx  (cost=0.00..124.36 rows=4239 width=0)
                     Index Cond: (object_id = 123::bigint)
(7 rows)
Run Code Online (Sandbox Code Playgroud)