iva*_*rec 6 postgresql performance index-tuning greatest-n-per-group postgresql-10 postgresql-performance
我prices
在 PostgreSQL 10 DB 中有一个时间序列表。
这是一个简化的测试用例来说明问题:
CREATE TABLE prices (
currency text NOT NULL,
side boolean NOT NULL,
price numeric NOT NULL,
ts timestamptz NOT NULL
);
Run Code Online (Sandbox Code Playgroud)
我想快速查询每个currency
/side
双人组的最后一个值,因为这会给我每种货币的当前买入/卖出价格。
我目前的解决方案是:
create index on prices (currency, side, ts desc);
select distinct on (currency, side) *
order by currency, side, ts desc;
Run Code Online (Sandbox Code Playgroud)
但这会让我在这个只有 ~30k 行的表中查询非常慢(~500ms)。
在实际的表都有,我想组四列而不是两个。下面是实际的表和查询的样子:
create table prices (
exchange integer not null,
pair text not null,
side boolean not null,
guaranteed_volume numeric not null,
ts timestamp with time zone not null,
price numeric not null,
constraint prices_pkey primary key (exchange, pair, side, guaranteed_volume, ts),
constraint prices_exchange_fkey foreign key (exchange)
references exchanges (id) match simple
on update no action
on delete no action
);
create index prices_exchange_pair_side_guaranteed_volume_ts_idx
on prices (exchange, pair, side, guaranteed_volume, ts desc);
create view last_prices as
select distinct on (exchange, pair, side, guaranteed_volume)
exchange
, pair
, side
, guaranteed_volume
, price
, ts
from prices
order by exchange
, pair
, side
, guaranteed_volume
, ts desc;
Run Code Online (Sandbox Code Playgroud)
目前有 34441 行。一些有用的调试查询:
# explain (analyze,buffers) select * from last_prices;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Unique (cost=2662.03..2997.71 rows=1224 width=37) (actual time=403.218..459.041 rows=392 loops=1)
Buffers: shared hit=418
-> Sort (cost=2662.03..2729.17 rows=26854 width=37) (actual time=403.213..411.041 rows=28353 loops=1)
Sort Key: prices.exchange, prices.pair, prices.side, prices.guaranteed_volume, prices.ts DESC
Sort Method: quicksort Memory: 2984kB
Buffers: shared hit=418
-> Seq Scan on prices (cost=0.00..686.54 rows=26854 width=37) (actual time=0.022..31.407 rows=28353 loops=1)
Buffers: shared hit=418
Planning time: 0.911 ms
Execution time: 460.190 ms
Run Code Online (Sandbox Code Playgroud)
在禁用 seqscan 的情况下解释分析:
# explain (analyze,buffers) select * from last_prices;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=0.41..4458.07 rows=1224 width=37) (actual time=0.037..122.237 rows=392 loops=1)
Buffers: shared hit=15182
-> Index Scan using prices_exchange_pair_side_guaranteed_volume_ts_idx on prices (cost=0.41..4189.53 rows=26854 width=37) (actual time=0.034..91.237 rows=29649 loops=1)
Buffers: shared hit=15182
Planning time: 0.291 ms
Execution time: 122.417 ms
Run Code Online (Sandbox Code Playgroud)
添加直接访问视图查询的查询:
# explain (analyze, buffers)
select distinct on (exchange, pair, side, guaranteed_volume)
exchange
, pair
, side
, guaranteed_volume
, price
, ts
from prices
order by exchange
, pair
, side
, guaranteed_volume
, ts desc;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Unique (cost=2163.56..2429.99 rows=1224 width=37) (actual time=364.716..391.405 rows=380 loops=1)
Buffers: shared hit=418
-> Sort (cost=2163.56..2216.85 rows=21314 width=37) (actual time=364.711..370.458 rows=24011 loops=1)
Sort Key: exchange, pair, side, guaranteed_volume, ts DESC
Sort Method: quicksort Memory: 2644kB
Buffers: shared hit=418
-> Seq Scan on prices (cost=0.00..631.14 rows=21314 width=37) (actual time=0.025..13.751 rows=24011 loops=1)
Buffers: shared hit=418
Planning time: 0.258 ms
Execution time: 392.110 ms
Run Code Online (Sandbox Code Playgroud)
我想快速查询每个
currency
/side
双人组的最后一个值
DISTINCT ON
每个兴趣组合都擅长几行。但是您的用例显然每个不同的行都有很多行(currency, side)
。DISTINCT ON
就性能而言,这是一个糟糕的选择。您将在关于 SO 的这两个相关答案中找到详细的评估和解决方案库:
如果您只需要最新的时间戳ts
,则该列将排序条件和所需的返回值合二为一,这种情况非常简单。看看Evan 的简单解决方案max(ts)
。
(好吧,理想情况下,您应该在 上有一个索引(currency, side, ts desc NULLS LAST)
,因为max(ts)
忽略 NULL 值并更好地匹配此排序顺序。但这对于已定义的列来说无关紧要NOT NULL
。)
通常,您需要从每个选定的行中添加额外的列(例如当前价格!)和/或您需要按多列排序,因此您需要做更多的事情。
理想情况下,您有另一个列出所有货币的表格 - 以及一个 FK 约束来强制执行参照完整性并禁止不存在的货币。然后在链接的答案中使用“2a. LATERAL join”一章中的查询技术,扩展以解释添加的内容:side
基于您最初的简单测试用例:
SELECT c.currency, s.side, p.*
FROM currency c
CROSS JOIN (VALUES (true), (false)) s(side) -- account for side
CROSS JOIN LATERAL (
SELECT ts, price -- more columns?
FROM prices
WHERE currency = c.currency
AND side = s.side
ORDER BY ts DESC -- ts is NOT NULL
LIMIT 1
) p
ORDER BY 1, 2; -- optional, whatever you prefer;
Run Code Online (Sandbox Code Playgroud)
您应该会在 上的索引上看到非常快的索引扫描(currency, side, ts DESC)
。
如果仅索引扫描是可能的并且您只需要ts
并且price
可能需要将price
作为最后一列添加到索引中。
dbfiddle在这里
是否将此查询保存在 aVIEW
中不会影响性能。
归档时间: |
|
查看次数: |
1144 次 |
最近记录: |