为什么添加 LIMIT 200 会导致查询变慢?

kas*_*erd 7 postgresql

我正在尝试调试 PostgreSQL 9.1.13 数据库上的慢查询,我有点不知所措。ORM 框架生成的确切查询是:

SELECT "core_product"."sales_price", "core_product"."recommended_price", "core_productgroup"."name", "core_product"."number", "core_product"."name", "core_product"."description", "core_product"."cost_price", "core_product"."bar_code", "core_product"."accessible"
FROM "core_product" INNER JOIN "core_productgroup" ON ( "core_product"."product_group_id" = "core_productgroup"."id" )
WHERE "core_productgroup"."company_id" = 1056
ORDER BY "core_product"."id" ASC
LIMIT 200;
Run Code Online (Sandbox Code Playgroud)

此查询需要 28 秒才能返回 200 行,这对于我们的用例来说太慢了。

首次尝试了解性能瓶颈可能在哪里。我首先尝试删除LIMIT 200预期它会更慢。但是没有LIMIT 200查询只需要 2 秒就返回大约 293000 行。

如何更快地返回所有 293000 个匹配行而不是仅返回前 200 行?

我尝试使用EXPLAIN查看两个查询的查询计划有何不同。事实证明,这两个几乎相同的查询具有完全不同的查询计划。与LIMIT

                                                   QUERY PLAN                                                   
----------------------------------------------------------------------------------------------------------------
 Limit  (cost=10.69..52229.70 rows=200 width=76)
   ->  Nested Loop  (cost=10.69..17054740.55 rows=65320 width=76)
         Join Filter: (core_product.product_group_id = core_productgroup.id)
         ->  Index Scan using core_product_pkey on core_product  (cost=0.00..3124799.28 rows=2957497 width=71)
         ->  Materialize  (cost=10.69..131.18 rows=314 width=13)
               ->  Bitmap Heap Scan on core_productgroup  (cost=10.69..129.61 rows=314 width=13)
                     Recheck Cond: (company_id = 1056)
                     ->  Bitmap Index Scan on core_productgroup_company_id  (cost=0.00..10.61 rows=314 width=0)
                           Index Cond: (company_id = 1056)
Run Code Online (Sandbox Code Playgroud)

没有LIMIT

                                                   QUERY PLAN                                                   
----------------------------------------------------------------------------------------------------------------
 Sort  (cost=110561.36..110724.66 rows=65320 width=76)
   Sort Key: core_product.id
   ->  Hash Join  (cost=133.54..102432.32 rows=65320 width=76)
         Hash Cond: (core_product.product_group_id = core_productgroup.id)
         ->  Seq Scan on core_product  (cost=0.00..90554.97 rows=2957497 width=71)
         ->  Hash  (cost=129.61..129.61 rows=314 width=13)
               ->  Bitmap Heap Scan on core_productgroup  (cost=10.69..129.61 rows=314 width=13)
                     Recheck Cond: (company_id = 1056)
                     ->  Bitmap Index Scan on core_productgroup_company_id  (cost=0.00..10.61 rows=314 width=0)
                           Index Cond: (company_id = 1056)
Run Code Online (Sandbox Code Playgroud)

有什么方法可以影响 PostgreSQL 选择的查询计划,以避免它在使用时当前使用的非常低效的查询计划LIMIT

详细查询计划LIMIT

                                                                                                                                                                                                                            QUERY PLAN                                                                                                                                                                                                                            
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=10.69..52229.70 rows=200 width=76) (actual time=41669.575..41681.069 rows=200 loops=1)
   Output: core_product.sales_price, core_product.recommended_price, core_productgroup.name, core_product.number, core_product.name, core_product.description, core_product.cost_price, core_product.bar_code, core_product.accessible, core_product.id
   ->  Nested Loop  (cost=10.69..17054740.55 rows=65320 width=76) (actual time=41669.573..41681.040 rows=200 loops=1)
         Output: core_product.sales_price, core_product.recommended_price, core_productgroup.name, core_product.number, core_product.name, core_product.description, core_product.cost_price, core_product.bar_code, core_product.accessible, core_product.id
         Join Filter: (core_product.product_group_id = core_productgroup.id)
         ->  Index Scan using core_product_pkey on public.core_product  (cost=0.00..3124799.28 rows=2957497 width=71) (actual time=0.033..803.265 rows=773270 loops=1)
               Output: core_product.id, core_product.product_group_id, core_product.name, core_product.sales_price, core_product.cost_price, core_product.recommended_price, core_product.accessible, core_product.volume, core_product.in_stock, core_product.on_order, core_product.ordered, core_product.available, core_product.bar_code, core_product.description, core_product.logical_timestamp, core_product.number, core_product.unit, core_product.uuid
         ->  Materialize  (cost=10.69..131.18 rows=314 width=13) (actual time=0.000..0.018 rows=300 loops=773270)
               Output: core_productgroup.name, core_productgroup.id
               ->  Bitmap Heap Scan on public.core_productgroup  (cost=10.69..129.61 rows=314 width=13) (actual time=0.073..0.140 rows=300 loops=1)
                     Output: core_productgroup.name, core_productgroup.id
                     Recheck Cond: (core_productgroup.company_id = 1056)
                     ->  Bitmap Index Scan on core_productgroup_company_id  (cost=0.00..10.61 rows=314 width=0) (actual time=0.060..0.060 rows=300 loops=1)
                           Index Cond: (core_productgroup.company_id = 1056)
 Total runtime: 41681.125 ms
(15 rows)
Run Code Online (Sandbox Code Playgroud)

没有 的详细查询计划LIMIT

                                                                                                                                                                                                                            QUERY PLAN                                                                                                                                                                                                                            
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=110561.36..110724.66 rows=65320 width=76) (actual time=1733.710..1831.820 rows=292797 loops=1)
   Output: core_product.sales_price, core_product.recommended_price, core_productgroup.name, core_product.number, core_product.name, core_product.description, core_product.cost_price, core_product.bar_code, core_product.accessible, core_product.id
   Sort Key: core_product.id
   Sort Method: external merge  Disk: 28688kB
   ->  Hash Join  (cost=133.54..102432.32 rows=65320 width=76) (actual time=1.561..1239.564 rows=292797 loops=1)
         Output: core_product.sales_price, core_product.recommended_price, core_productgroup.name, core_product.number, core_product.name, core_product.description, core_product.cost_price, core_product.bar_code, core_product.accessible, core_product.id
         Hash Cond: (core_product.product_group_id = core_productgroup.id)
         ->  Seq Scan on public.core_product  (cost=0.00..90554.97 rows=2957497 width=71) (actual time=0.006..726.778 rows=3051563 loops=1)
               Output: core_product.id, core_product.product_group_id, core_product.name, core_product.sales_price, core_product.cost_price, core_product.recommended_price, core_product.accessible, core_product.volume, core_product.in_stock, core_product.on_order, core_product.ordered, core_product.available, core_product.bar_code, core_product.description, core_product.logical_timestamp, core_product.number, core_product.unit, core_product.uuid
         ->  Hash  (cost=129.61..129.61 rows=314 width=13) (actual time=0.186..0.186 rows=300 loops=1)
               Output: core_productgroup.name, core_productgroup.id
               Buckets: 1024  Batches: 1  Memory Usage: 13kB
               ->  Bitmap Heap Scan on public.core_productgroup  (cost=10.69..129.61 rows=314 width=13) (actual time=0.055..0.111 rows=300 loops=1)
                     Output: core_productgroup.name, core_productgroup.id
                     Recheck Cond: (core_productgroup.company_id = 1056)
                     ->  Bitmap Index Scan on core_productgroup_company_id  (cost=0.00..10.61 rows=314 width=0) (actual time=0.045..0.045 rows=300 loops=1)
                           Index Cond: (core_productgroup.company_id = 1056)
 Total runtime: 1883.235 ms
(18 rows)
Run Code Online (Sandbox Code Playgroud)

jja*_*nes 6

planner认为可以按照core_product.id顺序跑通,快速找到company_id=1056的200个匹配,到此就大功告成了。

但这行不通,因为所有带有小 core_product.id 的东西都是没有 company_id=1056 的东西。(例如, company_id=1056 是您最近加入的客户端,因此他们的所有数据都落在 id 序列的上端。但 PostgreSQL 不理解这一点。)

您可能可以通过使用 CTE 并像这样编写它来强制执行您想要的计划:

with t as (
   <your query, without the limit, goes here>
)
select * from t limit 200;
Run Code Online (Sandbox Code Playgroud)

  • 我有一个类似的问题,我通过简单地向 ORDER BY 添加一列解决了这个问题。如果 ORDER BY 中的列与索引列不匹配,Postgres 将不会使用该索引而是选择一个更好的索引。所以在这个例子中,如果你按 id+barcode 而不是 id 排序,Postgres 可能会选择使用 company_id 索引而不是 id 列上的索引。 (2认同)

For*_*est 2

此链接表示您无法直接影响连接。

查询计划使用表上的统计信息来选择其计划,因此在对表使用ANALYZE后您可能会看到更好的性能