Postgres 使用 order by、index 和 limit 进行慢速查询

Question

Postgres 使用 order by、index 和 limit 进行慢速查询

Zee*_*han 7 postgresql performance order-by limits postgresql-9.6 query-performance

我正在尝试提高 postgres(9.6) 查询的性能。这是我的架构，表包含大约 6000 万行。

          Column          |            Type             | Modifiers
--------------------------+-----------------------------+-----------
 transaction_id           | text                        | not null
 network_merchant_name    | text                        |
 network_merchant_id      | text                        |
 network_merchant_mcc     | integer                     |
 network_merchant_country | text                        |
 issuer_country           | text                        |
 merchant_id              | text                        |
 remapped_merchant_id     | text                        |
 created_at               | timestamp without time zone |
 updated_at               | timestamp without time zone |
 remapped_at              | timestamp without time zone |
Indexes:
    "mapped_transactions_pkey" PRIMARY KEY, btree (transaction_id)
    "ix_mapped_transactions_remapped_at" btree (remapped_at NULLS FIRST)

Run Code Online (Sandbox Code Playgroud)

这是我试图执行的查询。

SELECT *
FROM mapped_transactions
ORDER BY remapped_at ASC NULLS FIRST
LIMIT 10000;

Run Code Online (Sandbox Code Playgroud)

这是查询计划：

    QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.57..1511.67 rows=10000 width=146) (actual time=327049.374..327345.341 rows=10000 loops=1)
   Buffers: shared hit=574937 read=210425 dirtied=356 written=4457
   I/O Timings: read=146625.381 write=59.637
   ->  Index Scan using ix_mapped_transactions_remapped_at on mapped_transactions  (cost=0.57..16190862.91 rows=107145960 width=146) (actual time=327049.364..327339.402 rows=10000 loops=1)
         Buffers: shared hit=574937 read=210425 dirtied=356 written=4457
         I/O Timings: read=146625.381 write=59.637
 Planning time: 0.125 ms
 Execution time: 327348.322 ms
(8 rows)

Run Code Online (Sandbox Code Playgroud)

我不明白为什么当remapped_at列上有索引时需要这么多时间。

另一方面，如果我反向订购，它会很快。

SELECT *
FROM mapped_transactions
ORDER BY remapped_at DESC NULLS LAST
LIMIT 10000;

Run Code Online (Sandbox Code Playgroud)

计划是：

QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.57..1511.67 rows=10000 width=146) (actual time=0.020..9.268 rows=10000 loops=1)
   Buffers: shared hit=1802
   ->  Index Scan Backward using ix_mapped_transactions_remapped_at on mapped_transactions  (cost=0.57..16190848.04 rows=107145866 width=146) (actual time=0.018..4.759 rows=10000 loops=1)
         Buffers: shared hit=1802
 Planning time: 0.080 ms
 Execution time: 11.561 ms
(6 rows)

Run Code Online (Sandbox Code Playgroud)

有人可以帮助我如何提高第一个查询的性能吗？

更新

我通过重建表和重新索引数据解决了这个问题。VACUUM FULL ANALYZE不是一个选项，因为该表正在被使用，我不想锁定它。
该指数的性能正在迅速恶化。7小时前重建索引，性能不错。现在查询在大约 10 秒内回答。请注意，这张表是写重的。如何使索引快速？我是否必须经常重新索引表？这张表没有删减，但有很多更新。

Answer 1

Zee*_*han 0

我又进行了一些测试，看来这个解决方案对我不起作用。随着时间的推移，索引不断退化，如果我不重新索引（或删除并再次创建），我想要运行的查询将变得越来越慢。有多种方法可以在不阻止其他读/写的情况下创建索引，但目前我不打算采用它们，因为它们不易扩展。我无法用我所拥有的知识解决这个问题，所以我将采用另一种方法。此方法使用的索引具有恒定的列，并且永远不会更新。谢谢大家帮助我。

归档时间：	8 年，7 月前
查看次数：	3985 次
最近记录：	8 年，7 月前