Postgres 中索引列的查询速度极慢

dav*_*ids 4 postgresql performance order-by index-tuning amazon-rds

我对索引列的查询速度非常慢。鉴于查询

SELECT * 
FROM orders 
WHERE shop_id = 3828 
ORDER BY updated_at desc 
LIMIT 1
Run Code Online (Sandbox Code Playgroud)

explain analyze 回来:

    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..594.45 rows=1 width=175) (actual time=202106.830..202106.831 rows=1 loops=1)
   ->  Index Scan Backward using index_orders_on_updated_at on orders  (cost=0.43..267901.54 rows=451 width=175) (actual time=202106.827..202106.827 rows=1 loops=1)
         Filter: (shop_id = 3828)
         Rows Removed by Filter: 1604818
 Planning time: 98.579 ms
 Execution time: 202127.514 ms
(6 rows)
Run Code Online (Sandbox Code Playgroud)

表说明为:

                                         Table "public.orders"
       Column       |            Type             |                           Modifiers
--------------------+-----------------------------+---------------------------------------------------------------
 id                 | integer                     | not null default nextval('orders_id_seq'::regclass)
 sent               | boolean                     | default false
 created_at         | timestamp without time zone |
 updated_at         | timestamp without time zone |
 name               | character varying(255)      |
 shop_id            | integer                     |
 recovered_at       | timestamp without time zone |
 total_price        | double precision            |
Indexes:
    "orders_pkey" PRIMARY KEY, btree (id)
    "index_orders_on_recovered_at" btree (recovered_at)
    "index_orders_on_shop_id" btree (shop_id)
    "index_orders_on__updated_at" btree (updated_at)
Run Code Online (Sandbox Code Playgroud)

它是一个 Postgres 服务器,在 AWS RDS t2 微型实例上运行。
该表有大约 260 万行。

Erw*_*ter 7

你的ORDER BY条款中隐藏着一个微妙的问题:

ORDER BY updated_at DESC 
Run Code Online (Sandbox Code Playgroud)

首先对 NULL 值进行排序。我假设你不想要那样。您的列updated_at可以为 NULL(缺少NOT NULL约束)。可能应该添加缺少的约束。您的查询应该以任何一种方式修复:

SELECT * 
FROM   orders 
WHERE  shop_id = 3828 
ORDER  BY updated_at DESC NULLS LAST
LIMIT  1;
Run Code Online (Sandbox Code Playgroud)

提到的多列索引@Ste Bov应该相应地进行调整:

CREATE INDEX orders_shop_id_updated_at_idx ON orders (shop_id, updated_at DESC NULLS LAST);
Run Code Online (Sandbox Code Playgroud)

然后你会得到一个基本的Index Scan而不是(几乎一样快)Index Scan Backward,你不会得到额外的index condition: Index Cond: (updated_at IS NOT NULL).

有关的:

旁白

您可以通过优化大表的列顺序来节省一些浪费的磁盘空间(这使一切都变得更快):

id                 | integer                     | not null default nextval( ...
shop_id            | integer                     |
sent               | boolean                     | default false
name               | varchar(255)                |
total_price        | double precision            |
recovered_at       | timestamp without time zone |
created_at         | timestamp without time zone |
updated_at         | timestamp without time zone |
Run Code Online (Sandbox Code Playgroud)

看:

NOT NULL向所有不能为 NULL 的列添加约束。

考虑textorvarchar代替varchar(255),timestamptz代替timestampandinteger用于价格(如美分)numeric(对于小数),它是一种任意精度类型,完全按照给定的方式存储您的值。永远不要有损浮点类型用于“价格”或与金钱有关的任何事情。


Ste*_*Bov 5

我不太了解 Postgresql,但是您正在检查两个单独的键以找到您要查找的值,请尝试将其创建为复合键

"index_orders_on_shop_id" btree (shop_id)
"index_orders_on__updated_at" btree (updated_at)
Run Code Online (Sandbox Code Playgroud)

变成

"index_orders_on_shop_id__updated_at" btree (shop_id,updated_at)
Run Code Online (Sandbox Code Playgroud)

这可以帮助

如果有一种方法可以在索引中包含值,效果会更好