优化位图堆扫描

Question

优化位图堆扫描

ins*_*ide 6 postgresql performance optimization query-performance postgresql-performance

我试图理解为什么我的查询需要很长时间，即使我已经索引了所需的列：

SELECT entity_id,
       id,
       report_date
FROM own_inst_detail
WHERE ( own_inst_detail.id = 'P7M7WC-S' )
  AND ( own_inst_detail.report_date >= '2017-02-01T17:29:49.661Z' )
  AND ( own_inst_detail.report_date <= '2018-08-01T17:29:49.663Z' )

Run Code Online (Sandbox Code Playgroud)

缓存结果EXPLAIN ANALYZE如下：

Bitmap Heap Scan on own_inst_detail (cost=20.18..2353.55 rows=597 width=22) (actual time=1.471..6.955 rows=4227 loops=1)
  Recheck Cond: ((id = 'P7M7WC-S'::bpchar) AND (report_date >= '2017-06-01'::date) AND (report_date <= '2018-08-01'::date))
  Heap Blocks: exact=4182
  ->  Bitmap Index Scan on own_inst_detail  (cost=0.00..20.03 rows=597 width=0) (actual time=0.901..0.901 rows=4227 loops=1)
        Index Cond: ((id = 'P7M7WC-S'::bpchar) AND (report_date >= '2017-06-01'::date) AND (report_date <= '2018-08-01'::date))
Planning time: 0.123 ms
Execution time: 7.801 ms

Run Code Online (Sandbox Code Playgroud)

这部分查询花费了我的完整查询总共 5 秒中的 4 秒。

我合并了id和report_date的索引。我还有两个针对这些列的独立索引。

我尝试过设置高work_mem以及降低random_page_cost 但没有任何帮助。

如有任何额外建议，我们将不胜感激。

我发现类似的问题How to index WHERE (start_date >= '2013-12-15')建议添加 B 树索引，但我已经有了report_date的索引。

创建表脚本：

CREATE TABLE IF NOT EXISTS public.own_inst_detail (
    entity_id character(8) NOT NULL,
    id character(8) NOT NULL,
    report_date date NOT NULL,
    PRIMARY KEY(report_date)
);

Run Code Online (Sandbox Code Playgroud)

指数：

CREATE INDEX indx_own_inst_detail_report_date_desc ON own_inst_detail (report_date DESC NULLS LAST)

CREATE INDEX indx_own_inst_detail_id_report_date_desc ON own_inst_detail (id, report_date DESC NULLS LAST)

Run Code Online (Sandbox Code Playgroud)

Answer 1

Erw*_*ter 8

数据类型

\n

数据类型character(n)几乎总是错误的选择。这是我不会再使用的“传统”类型。它表现出令人惊讶的行为，但什么也不做text//varcharvarchar(n)不能做得更好。

\n

使用数据类型 \xe2\x80\x9ctext\xe2\x80\x9d 存储字符串有什么缺点吗？

\n

并且您将date表中的类型与timestamp查询中的文字混合在一起。虽然这有效，但您至少应该提供显式类型声明以防止出现意外的结果。更好的是，在查询中提供实际日期或显式转换输入。喜欢

\n

report_date >= timestamp \'2017-02-01T17:29:49.661Z\'\n

Run Code Online (Sandbox Code Playgroud)\n

或者：

\n

report_date >= date \'2017-02-02\'\n

Run Code Online (Sandbox Code Playgroud)\n

关于常量和类型转换的手册。

\n

指数

\n

您的索引indx_own_inst_detail_id_report_date_desc看起来(id, report_date DESC NULLS LAST) 适合查询，并且您提供的查询计划很好地利用了它。Execution time: 7.801 ms看起来也不错。

\n

如果您的“未缓存”查询需要 4 秒，那么您可能需要对硬件或服务器配置或两者进行修改。存储速度慢并且没有足够的RAM 用于缓存？更多的work_mem数据并不能解决这个问题，甚至可能会因为从高速缓存中取出 RAM 而使情况变得更糟。有关的：

\n

在 Postgres 9.2 上增加 work_mem 和共享缓冲区会显着减慢查询速度

\n

如果您有足够的 RAM 和正确的内存设置，这可能会导致冷缓存问题：只有第一次调用很慢（或前几次调用）。如果这是一个问题，请考虑pg_prewarm。看：

\n

PostgreSQL：将数据强制写入内存

\n

如果您的表已被充分清理（或大部分为只读），则如果您在表中附加一附加列，您可能会从仅索引扫描中受益。entity_idSELECT则如果您将列表

\n

CREATE INDEX ON own_inst_detail (id, report_date DESC NULLS LAST, entity_id)\n

Run Code Online (Sandbox Code Playgroud)\n

这在您的情况下可能特别有用，因为 Postgres 在这种情况下只访问索引，根本不需要访问表。（完全消除Bitmap Heap Scan。）可能有助于解决您的磁盘/冷缓存瓶颈。

\n

有关的：

\n

Postgres 可以对带有连接表的查询使用仅索引扫描吗？

\n

@inside：很难说。取决于完整的图片。你所描述的听起来像是“冷缓存”。我在上面又补充了一些。 (2认同)

归档时间：	7 年，3 月前
查看次数：	9102 次
最近记录：	3 年，1 月前