多列时间戳索引上的 min()/max()

use*_*675 7 postgresql performance postgresql-performance

我发现很难理解为什么在这个查询中进行了一堆堆提取。据我了解,当索引中没有空值(两端)时,反向搜索索引应该与直接搜索一样快,反之亦然。

我怀疑向前/向后扫描实际上是一个红鲱鱼,但我无法识别此解释输出中的任何其他有意义的差异。

这是表格布局。我已将我认为与问题无关的前两列匿名化,但为了完整起见,我保留了它们及其索引。

testqueuedb=> \d+ queue
                                                                  Table "public.queue"
        Column         |           Type           |                          Modifiers                          | Storage  | Stats target | Description
-----------------------+--------------------------+-------------------------------------------------------------+----------+--------------+-------------
 foo                   | character varying(64)    | not null                                                    | extended |              |
 bar                   | numeric(6,0)             | not null                                                    | main     |              |
 worker                | character varying(32)    | not null                                                    | extended |              |
 queued                | timestamp with time zone | not null default (timeofday())::timestamp without time zone | plain    |              |
Indexes:
    "queue_idx_job" btree (foo, bar, worker)
    "queue_idx_worker" btree (worker, queued)
Foreign-key constraints:
    "queue_fk_worker" FOREIGN KEY (worker) REFERENCES workers(worker)
Run Code Online (Sandbox Code Playgroud)

这是不同的最小/最大解释。

testqueuedb=> explain (analyze, buffers) select min(queued) from queue where worker = 'workername';
                                                                        QUERY PLAN                              
----------------------------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=0.59..0.60 rows=1 width=0) (actual time=1019.490..1019.490 rows=1 loops=1)
   Buffers: shared hit=20194 read=1
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.42..0.59 rows=1 width=8) (actual time=1019.485..1019.486 rows=1 loops=1)
           Buffers: shared hit=20194 read=1
           ->  Index Only Scan using queue_idx_worker on queue  (cost=0.42..55480.93 rows=330371 width=8) (actual time=1019.483..1019.483 rows=1 loops=1)
                 Index Cond: ((worker = 'workername'::text) AND (queued IS NOT NULL))
                 Heap Fetches: 20224
                 Buffers: shared hit=20194 read=1
 Planning time: 0.197 ms
 Execution time: 1019.529 ms
(11 rows)

testqueuedb=> explain (analyze, buffers) select max(queued) from queue where worker = 'workername';
                                                                         QUERY PLAN                             
-------------------------------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=0.59..0.60 rows=1 width=0) (actual time=0.508..0.509 rows=1 loops=1)
   Buffers: shared hit=2 read=3
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.42..0.59 rows=1 width=8) (actual time=0.503..0.503 rows=1 loops=1)
           Buffers: shared hit=2 read=3
           ->  Index Only Scan Backward using queue_idx_worker on queue  (cost=0.42..55480.93 rows=330371 width=8) (actual time=0.502..0.502 rows=1 loops=1)
                 Index Cond: ((worker = 'workername'::text) AND (queued IS NOT NULL))
                 Heap Fetches: 1
                 Buffers: shared hit=2 read=3
 Planning time: 0.215 ms
 Execution time: 0.546 ms
(11 rows)
Run Code Online (Sandbox Code Playgroud)

我发现第一个示例中的堆获取特别令人困惑。这一切都归结为缓冲吗?

Postgres 版本是 9.5.5。

表中每个工作人员大约有 500,000 行,并且很少有不同的工作人员 - 不到十个 - 这让我认为索引的结构一开始并不正确,但我对这些查询中的差异很感兴趣不管。

dla*_*and 1

我很确定这是由于选择了 timestamptz 作为queued列的数据类型。出于时区考虑,Postgres 必须访问所有列以确保找到真正的最大值。这就是索引扫描显示如此高计数的原因。

您应该将queued数据类型更改为 int (或 bigint)并使用序列自动递增。(如果您需要该值,当然可以保留时间戳列)。