如何加快查询时间序列中的最后一个值?

iva*_*rec 6 postgresql performance index-tuning greatest-n-per-group postgresql-10 postgresql-performance

prices在 PostgreSQL 10 DB 中有一个时间序列表。
这是一个简化的测试用例来说明问题:

CREATE TABLE prices (
    currency text NOT NULL,
    side     boolean NOT NULL,
    price    numeric NOT NULL,
    ts       timestamptz NOT NULL
);
Run Code Online (Sandbox Code Playgroud)

我想快速查询每个currency/side双人组的最后一个值,因为这会给我每种货币的当前买入/卖出价格。

我目前的解决方案是:

create index on prices (currency, side, ts desc);

select distinct on (currency, side) *
 order by currency, side, ts desc;
Run Code Online (Sandbox Code Playgroud)

但这会让我在这个只有 ~30k 行的表中查询非常(~500ms)。

实际的表都有,我想组四列而不是两个。下面是实际的表和查询的样子:

create table prices (
    exchange integer not null,
    pair text not null,
    side boolean not null,
    guaranteed_volume numeric not null,
    ts timestamp with time zone not null,
    price numeric not null,
    constraint prices_pkey primary key (exchange, pair, side, guaranteed_volume, ts),
    constraint prices_exchange_fkey foreign key (exchange)
        references exchanges (id) match simple
        on update no action
        on delete no action
);

create index prices_exchange_pair_side_guaranteed_volume_ts_idx
      on prices (exchange, pair, side, guaranteed_volume, ts desc);

create view last_prices as
select distinct on (exchange, pair, side, guaranteed_volume)
       exchange
     , pair
     , side
     , guaranteed_volume
     , price
     , ts
  from prices
 order by exchange
        , pair
        , side
        , guaranteed_volume
        , ts desc;
Run Code Online (Sandbox Code Playgroud)

目前有 34441 行。一些有用的调试查询:

# explain (analyze,buffers) select * from last_prices;
                                                       QUERY PLAN                                                       
------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=2662.03..2997.71 rows=1224 width=37) (actual time=403.218..459.041 rows=392 loops=1)
   Buffers: shared hit=418
   ->  Sort  (cost=2662.03..2729.17 rows=26854 width=37) (actual time=403.213..411.041 rows=28353 loops=1)
         Sort Key: prices.exchange, prices.pair, prices.side, prices.guaranteed_volume, prices.ts DESC
         Sort Method: quicksort  Memory: 2984kB
         Buffers: shared hit=418
         ->  Seq Scan on prices  (cost=0.00..686.54 rows=26854 width=37) (actual time=0.022..31.407 rows=28353 loops=1)
               Buffers: shared hit=418
 Planning time: 0.911 ms
 Execution time: 460.190 ms
Run Code Online (Sandbox Code Playgroud)

在禁用 seqscan 的情况下解释分析:

# explain (analyze,buffers) select * from last_prices;
                                                                                  QUERY PLAN                                                                                  
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=0.41..4458.07 rows=1224 width=37) (actual time=0.037..122.237 rows=392 loops=1)
   Buffers: shared hit=15182
   ->  Index Scan using prices_exchange_pair_side_guaranteed_volume_ts_idx on prices  (cost=0.41..4189.53 rows=26854 width=37) (actual time=0.034..91.237 rows=29649 loops=1)
         Buffers: shared hit=15182
 Planning time: 0.291 ms
 Execution time: 122.417 ms
Run Code Online (Sandbox Code Playgroud)

添加直接访问视图查询的查询:

# explain (analyze, buffers)
select distinct on (exchange, pair, side, guaranteed_volume)
       exchange
     , pair
     , side
     , guaranteed_volume
     , price
     , ts
  from prices
 order by exchange
        , pair
        , side
        , guaranteed_volume
        , ts desc;
                                                       QUERY PLAN                                                       
------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=2163.56..2429.99 rows=1224 width=37) (actual time=364.716..391.405 rows=380 loops=1)
   Buffers: shared hit=418
   ->  Sort  (cost=2163.56..2216.85 rows=21314 width=37) (actual time=364.711..370.458 rows=24011 loops=1)
         Sort Key: exchange, pair, side, guaranteed_volume, ts DESC
         Sort Method: quicksort  Memory: 2644kB
         Buffers: shared hit=418
         ->  Seq Scan on prices  (cost=0.00..631.14 rows=21314 width=37) (actual time=0.025..13.751 rows=24011 loops=1)
               Buffers: shared hit=418
 Planning time: 0.258 ms
 Execution time: 392.110 ms
Run Code Online (Sandbox Code Playgroud)

Erw*_*ter 5

我想快速查询每个currency/side双人组的最后一个值

DISTINCT ON每个兴趣组合都擅长行。但是您的用例显然每个不同的行都有很多(currency, side)DISTINCT ON就性能而言,这是一个糟糕的选择。您将在关于 SO 的这两个相关答案中找到详细的评估和解决方案库:

如果您只需要最新的时间戳ts,则该列将排序条件和所需的返回值合二为一,这种情况非常简单。看看Evan 的简单解决方案max(ts)

(好吧,理想情况下,您应该在 上有一个索引(currency, side, ts desc NULLS LAST),因为max(ts)忽略 NULL 值并更好地匹配此排序顺序。但这对于已定义的列来说无关紧要NOT NULL。)

通常,您需要从每个选定的行中添加额外的列(例如当前价格!)和/或您需要按多列排序,因此您需要做更多的事情。

理想情况下,您有另一个列出所有货币的表格 - 以及一个 FK 约束来强制执行参照完整性并禁止不存在的货币。然后在链接的答案中使用“2a. LATERAL join”一章中的查询技术,扩展以解释添加的内容:side

基于您最初的简单测试用例:

SELECT c.currency, s.side, p.*
FROM   currency c
CROSS  JOIN (VALUES (true), (false)) s(side)  -- account for side
CROSS  JOIN LATERAL (
   SELECT ts, price              -- more columns?
   FROM   prices
   WHERE  currency = c.currency
   AND    side = s.side
   ORDER  BY ts DESC             -- ts is NOT NULL
   LIMIT  1
   ) p
ORDER  BY 1, 2;  -- optional, whatever you prefer;
Run Code Online (Sandbox Code Playgroud)

您应该会在 上的索引上看到非常快的索引扫描(currency, side, ts DESC)

如果仅索引扫描是可能的并且您只需要ts并且price可能需要将price作为最后一列添加到索引中。

dbfiddle在这里

是否将此查询保存在 aVIEW中不会影响性能。