Pep*_*oen 11 postgresql performance architecture
我的表如下所示:
Column | Type |
-----------------------+-------------------+
id | integer |
source_id | integer |
timestamp | integer |
observation_timestamp | integer |
value | double precision |
Run Code Online (Sandbox Code Playgroud)
索引存在于 source_id、timestamp 以及时间戳和 id ( CREATE INDEX timeseries_id_timestamp_combo_idx ON timeseries (id, timeseries DESC NULLS LAST)
)的组合上
其中有 20M 行(好吧,有 120M,但是 20M,source_id = 1)。它有许多相同的条目,timestamp
带有不同的observation_timestamp
,它们描述了value
发生在timestamp
报告或观察到的事件observation_timestamp
。例如,预测明天下午 2 点的温度与今天上午 12 点预测的一样。
理想情况下,该表可以很好地完成以下几件事:
第二个是这个问题的核心。
表中的数据如下所示
id source_id timestamp observation_timestamp value
1 1 1531084900 1531083900 9999
2 1 1531084900 1531082900 1111
3 1 1531085900 1531083900 8888
4 1 1531085900 1531082900 7777
5 1 1531086900 1531082900 5555
Run Code Online (Sandbox Code Playgroud)
并且查询的输出如下所示(仅表示最新的 Observation_timestamp 的行)
id source_id timestamp observation_timestamp value
1 1 1531084900 1531083900 9999
3 1 1531085900 1531083900 8888
5 1 1531086900 1531082900 5555
Run Code Online (Sandbox Code Playgroud)
我之前已经参考了一些材料来优化这些查询,即
... 成功有限。
我已经考虑在其中创建一个单独的表,timestamp
以便更容易横向引用,但由于基数相对较高,我怀疑它们是否会帮助我 - 另外我担心它会阻碍完成batch inserting new entries
。
我正在查看三个查询,它们都给我带来了糟糕的性能
(我知道他们目前并没有完全做同样的事情,但就我所见,它们可以很好地说明查询类型。)
带有 LATERAL 连接的递归 CTE
WITH RECURSIVE cte AS (
(
SELECT ts
FROM timeseries ts
WHERE source_id = 1
ORDER BY id, "timestamp" DESC NULLS LAST
LIMIT 1
)
UNION ALL
SELECT (
SELECT ts1
FROM timeseries ts1
WHERE id > (c.ts).id
AND source_id = 1
ORDER BY id, "timestamp" DESC NULLS LAST
LIMIT 1
)
FROM cte c
WHERE (c.ts).id IS NOT NULL
)
SELECT (ts).*
FROM cte
WHERE (ts).id IS NOT NULL
ORDER BY (ts).id;
Run Code Online (Sandbox Code Playgroud)
表现:
Sort (cost=164999681.98..164999682.23 rows=100 width=28)
Sort Key: ((cte.ts).id)
CTE cte
-> Recursive Union (cost=1653078.24..164999676.64 rows=101 width=52)
-> Subquery Scan on *SELECT* 1 (cost=1653078.24..1653078.26 rows=1 width=52)
-> Limit (cost=1653078.24..1653078.25 rows=1 width=60)
-> Sort (cost=1653078.24..1702109.00 rows=19612304 width=60)
Sort Key: ts.id, ts.timestamp DESC NULLS LAST
-> Bitmap Heap Scan on timeseries ts (cost=372587.92..1555016.72 rows=19612304 width=60)
Recheck Cond: (source_id = 1)
-> Bitmap Index Scan on ix_timeseries_source_id (cost=0.00..367684.85 rows=19612304 width=0)
Index Cond: (source_id = 1)
-> WorkTable Scan on cte c (cost=0.00..16334659.64 rows=10 width=32)
Filter: ((ts).id IS NOT NULL)
SubPlan 1
-> Limit (cost=1633465.94..1633465.94 rows=1 width=60)
-> Sort (cost=1633465.94..1649809.53 rows=6537435 width=60)
Sort Key: ts1.id, ts1.timestamp DESC NULLS LAST
-> Bitmap Heap Scan on timeseries ts1 (cost=369319.21..1600778.77 rows=6537435 width=60)
Recheck Cond: (source_id = 1)
Filter: (id > (c.ts).id)
-> Bitmap Index Scan on ix_timeseries_source_id (cost=0.00..367684.85 rows=19612304 width=0)
Index Cond: (source_id = 1)
-> CTE Scan on cte (cost=0.00..2.02 rows=100 width=28)
Filter: ((ts).id IS NOT NULL)
Run Code Online (Sandbox Code Playgroud)
(仅EXPLAIN
,EXPLAIN ANALYZE
无法完成,需要 >24 小时才能完成查询)
窗函数
WITH summary AS (
SELECT ts.id, ts.source_id, ts.value,
ROW_NUMBER() OVER(PARTITION BY ts.timestamp ORDER BY ts.observation_timestamp DESC) AS rn
FROM timeseries ts
WHERE source_id = 1
)
SELECT s.*
FROM summary s
WHERE s.rn = 1;
Run Code Online (Sandbox Code Playgroud)
表现:
CTE Scan on summary s (cost=5530627.97..5971995.66 rows=98082 width=24) (actual time=150368.441..226331.286 rows=88404 loops=1)
Filter: (rn = 1)
Rows Removed by Filter: 20673704
CTE summary
-> WindowAgg (cost=5138301.13..5530627.97 rows=19616342 width=32) (actual time=150368.429..171189.504 rows=20762108 loops=1)
-> Sort (cost=5138301.13..5187341.98 rows=19616342 width=24) (actual time=150368.405..165390.033 rows=20762108 loops=1)
Sort Key: ts.timestamp, ts.observation_timestamp DESC
Sort Method: external merge Disk: 689752kB
-> Bitmap Heap Scan on timeseries ts (cost=372675.22..1555347.49 rows=19616342 width=24) (actual time=2767.542..50399.741 rows=20762108 loops=1)
Recheck Cond: (source_id = 1)
Rows Removed by Index Recheck: 217784
Heap Blocks: exact=48415 lossy=106652
-> Bitmap Index Scan on ix_timeseries_source_id (cost=0.00..367771.13 rows=19616342 width=0) (actual time=2757.245..2757.245 rows=20762630 loops=1)
Index Cond: (source_id = 1)
Planning time: 0.186 ms
Execution time: 234883.090 ms
Run Code Online (Sandbox Code Playgroud)
与众不同
SELECT DISTINCT ON (timestamp) *
FROM timeseries
WHERE source_id = 1
ORDER BY timestamp, observation_timestamp DESC;
Run Code Online (Sandbox Code Playgroud)
表现:
Unique (cost=5339449.63..5437531.34 rows=15991 width=28) (actual time=112653.438..121397.944 rows=88404 loops=1)
-> Sort (cost=5339449.63..5388490.48 rows=19616342 width=28) (actual time=112653.437..120175.512 rows=20762108 loops=1)
Sort Key: timestamp, observation_timestamp DESC
Sort Method: external merge Disk: 770888kB
-> Bitmap Heap Scan on timeseries (cost=372675.22..1555347.49 rows=19616342 width=28) (actual time=2091.585..56109.942 rows=20762108 loops=1)
Recheck Cond: (source_id = 1)
Rows Removed by Index Recheck: 217784
Heap Blocks: exact=48415 lossy=106652
-> Bitmap Index Scan on ix_timeseries_source_id (cost=0.00..367771.13 rows=19616342 width=0) (actual time=2080.054..2080.054 rows=20762630 loops=1)
Index Cond: (source_id = 1)
Planning time: 0.132 ms
Execution time: 161651.006 ms
Run Code Online (Sandbox Code Playgroud)
我应该如何构建我的数据,是否存在不应该存在的扫描,通常是否可以将这些查询设置为 ~1s(而不是 ~120s)?
是否有不同的方式查询数据以获得我想要的结果?
如果没有,我应该关注哪些不同的基础设施/架构?
对于递归 CTE 查询,ORDER BY (ts).id
不需要最后一个查询,因为 CTE 会自动按该顺序创建它们。删除它应该会使查询更快,它可以提前停止,而不是生成 20,180,572 行,然后只丢弃除 500 行之外的所有行。此外,建立索引(source_id, id, timestamp desc nulls last)
应该进一步改进它。
对于其他两个查询,增加 work_mem 足以使位图适合内存(以消除有损堆块)会有所帮助。但不如自定义索引那么多,例如(source_id, "timestamp", observation_timestamp DESC)
或更好的是仅索引扫描(source_id, "timestamp", observation_timestamp DESC, value, id)
。