Bru*_*uno 8 postgresql performance
使用PostgreSQL(8.4),我创建总结了几桌各种结果的视图(例如创建列a,b,c在视图中),然后我需要一些结果结合在同一个查询(例如a+b,a-b,(a+b)/c, ...),从而产生最终结果。我注意到的是,每次使用中间结果时都会完全计算它们,即使它是在同一个查询中完成的。
有没有办法优化这个以避免每次都计算相同的结果?
这是一个重现问题的简化示例。
CREATE TABLE test1 (
id SERIAL PRIMARY KEY,
log_timestamp TIMESTAMP NOT NULL
);
CREATE TABLE test2 (
test1_id INTEGER NOT NULL REFERENCES test1(id),
category VARCHAR(10) NOT NULL,
col1 INTEGER,
col2 INTEGER
);
CREATE INDEX test_category_idx ON test2(category);
-- Added after edit to this question
CREATE INDEX test_id_idx ON test2(test1_id);
-- Populating with test data.
INSERT INTO test1(log_timestamp)
SELECT * FROM generate_series('2011-01-01'::timestamp, '2012-01-01'::timestamp, '1 hour');
INSERT INTO test2
SELECT id, substr(upper(md5(random()::TEXT)), 1, 1),
(20000*random()-10000)::int, (3000*random()-200)::int FROM test1;
INSERT INTO test2
SELECT id, substr(upper(md5(random()::TEXT)), 1, 1),
(2000*random()-1000)::int, (3000*random()-200)::int FROM test1;
INSERT INTO test2
SELECT id, substr(upper(md5(random()::TEXT)), 1, 1),
(2000*random()-40)::int, (3000*random()-200)::int FROM test1;
Run Code Online (Sandbox Code Playgroud)
这是执行最耗时操作的视图:
CREATE VIEW testview1 AS
SELECT
t1.id,
t1.log_timestamp,
(SELECT SUM(t2.col1) FROM test2 t2 WHERE t2.test1_id=t1.id AND category='A') AS a,
(SELECT SUM(t2.col2) FROM test2 t2 WHERE t2.test1_id=t1.id AND category='B') AS b,
(SELECT SUM(t2.col1 - t2.col2) FROM test2 t2 WHERE t2.test1_id=t1.id AND category='C') AS c
FROM test1 t1;
Run Code Online (Sandbox Code Playgroud)
SELECT a FROM testview1产生这个计划(通过EXPLAIN ANALYZE):
Seq Scan on test1 t1 (cost=0.00..1787086.55 rows=8761 width=4) (actual time=12.877..10517.575 rows=8761 loops=1)
SubPlan 1
-> Aggregate (cost=203.96..203.97 rows=1 width=4) (actual time=1.193..1.193 rows=1 loops=8761)
-> Bitmap Heap Scan on test2 t2 (cost=36.49..203.95 rows=1 width=4) (actual time=1.109..1.177 rows=0 loops=8761)
Recheck Cond: ((category)::text = 'A'::text)
Filter: (test1_id = $0)
-> Bitmap Index Scan on test_category_idx (cost=0.00..36.49 rows=1631 width=0) (actual time=0.414..0.414 rows=1631 loops=8761)
Index Cond: ((category)::text = 'A'::text)
Total runtime: 10522.346 ms
Run Code Online (Sandbox Code Playgroud)SELECT a, a FROM testview1产生这个计划:
Seq Scan on test1 t1 (cost=0.00..3574037.50 rows=8761 width=4) (actual time=3.343..20550.817 rows=8761 loops=1)
SubPlan 1
-> Aggregate (cost=203.96..203.97 rows=1 width=4) (actual time=1.183..1.183 rows=1 loops=8761)
-> Bitmap Heap Scan on test2 t2 (cost=36.49..203.95 rows=1 width=4) (actual time=1.100..1.166 rows=0 loops=8761)
Recheck Cond: ((category)::text = 'A'::text)
Filter: (test1_id = $0)
-> Bitmap Index Scan on test_category_idx (cost=0.00..36.49 rows=1631 width=0) (actual time=0.418..0.418 rows=1631 loops=8761)
Index Cond: ((category)::text = 'A'::text)
SubPlan 2
-> Aggregate (cost=203.96..203.97 rows=1 width=4) (actual time=1.154..1.154 rows=1 loops=8761)
-> Bitmap Heap Scan on test2 t2 (cost=36.49..203.95 rows=1 width=4) (actual time=1.083..1.143 rows=0 loops=8761)
Recheck Cond: ((category)::text = 'A'::text)
Filter: (test1_id = $0)
-> Bitmap Index Scan on test_category_idx (cost=0.00..36.49 rows=1631 width=0) (actual time=0.426..0.426 rows=1631 loops=8761)
Index Cond: ((category)::text = 'A'::text)
Total runtime: 20557.581 ms
Run Code Online (Sandbox Code Playgroud)在这里, selectiona, a花费的时间是 selection 的两倍a,而它们实际上可以只计算一次。例如,使用SELECT a, a+b, a-b FROM testview1,它通过子计划a3 次和 2b次,而执行时间可以减少到总时间的 2/5(假设这里 + 和 - 可以忽略不计)。
它在不需要时不计算未使用的列(b和c)是一件好事,但是有没有办法让它只从视图中计算相同的使用列一次?
编辑: @Frank Heikens 正确建议使用索引,该索引在上面的示例中缺失。虽然它确实提高了每个子计划的速度,但它不会阻止多次计算相同的子查询。抱歉,我应该把它放在最初的问题中以说明清楚。
(抱歉回答我自己的问题,但在阅读了这个不相关的问题和答案后,我想到我应该尝试使用 CTE。它有效。)
这是另一个视图,类似于testview1问题中的视图,但使用公共表表达式:
CREATE VIEW testview2 AS
WITH testcte AS (SELECT
t1.id,
t1.log_timestamp,
(SELECT SUM(t2.col1) FROM test2 t2 WHERE t2.test1_id=t1.id AND category='A') AS a,
(SELECT SUM(t2.col2) FROM test2 t2 WHERE t2.test1_id=t1.id AND category='B') AS b,
(SELECT SUM(t2.col1 - t2.col2) FROM test2 t2 WHERE t2.test1_id=t1.id AND category='C') AS c
FROM test1 t1)
SELECT * FROM testcte;
Run Code Online (Sandbox Code Playgroud)
(这只是一个例子,我并不是建议将视图和 CTE 结合起来一定是个好主意:CTE 可能就足够了。)
与 不同testview1,SELECT a FROM testview2现在的查询计划还计算b和c,由于在 中未使用而被忽略testview1:
Subquery Scan testview2 (cost=395272.42..395535.25 rows=8761 width=8) (actual time=0.256..607.941 rows=8761 loops=1)
-> CTE Scan on testcte (cost=395272.42..395447.64 rows=8761 width=36) (actual time=0.255..604.106 rows=8761 loops=1)
CTE testcte
-> Seq Scan on test1 t1 (cost=0.00..395272.42 rows=8761 width=12) (actual time=0.252..589.358 rows=8761 loops=1)
SubPlan 1
-> Aggregate (cost=15.02..15.03 rows=1 width=4) (actual time=0.021..0.021 rows=1 loops=8761)
-> Bitmap Heap Scan on test2 t2 (cost=4.28..15.02 rows=1 width=4) (actual time=0.015..0.015 rows=0 loops=8761)
Recheck Cond: (test1_id = $0)
Filter: ((category)::text = 'A'::text)
-> Bitmap Index Scan on test_if_idx (cost=0.00..4.28 rows=3 width=0) (actual time=0.009..0.009 rows=3 loops=8761)
Index Cond: (test1_id = $0)
SubPlan 2
-> Aggregate (cost=15.02..15.03 rows=1 width=4) (actual time=0.019..0.019 rows=1 loops=8761)
-> Bitmap Heap Scan on test2 t2 (cost=4.28..15.02 rows=1 width=4) (actual time=0.012..0.012 rows=0 loops=8761)
Recheck Cond: (test1_id = $0)
Filter: ((category)::text = 'B'::text)
-> Bitmap Index Scan on test_if_idx (cost=0.00..4.28 rows=3 width=0) (actual time=0.007..0.007 rows=3 loops=8761)
Index Cond: (test1_id = $0)
SubPlan 3
-> Aggregate (cost=15.02..15.04 rows=1 width=8) (actual time=0.020..0.020 rows=1 loops=8761)
-> Bitmap Heap Scan on test2 t2 (cost=4.28..15.02 rows=1 width=8) (actual time=0.013..0.014 rows=0 loops=8761)
Recheck Cond: (test1_id = $0)
Filter: ((category)::text = 'C'::text)
-> Bitmap Index Scan on test_if_idx (cost=0.00..4.28 rows=3 width=0) (actual time=0.007..0.007 rows=3 loops=8761)
Index Cond: (test1_id = $0)
Run Code Online (Sandbox Code Playgroud)
但是,它不会重新计算在同一查询中多次使用的结果(这是目标)。
不像testview1with whichSELECT a, a, a, a, a花费了 5 倍的时间SELECT a,这里SELECT a, a, a, a, a, b, c, a+b, a+c, b+c FROM testview2花费的时间与SELECT a FROM testview2or一样长SELECT a, b, c FROM testview2。它只通过a,b并且c一次:
Subquery Scan testview2 (cost=395272.42..395600.96 rows=8761 width=24) (actual time=0.147..562.790 rows=8761 loops=1)
-> CTE Scan on testcte (cost=395272.42..395447.64 rows=8761 width=36) (actual time=0.144..554.194 rows=8761 loops=1)
CTE testcte
-> Seq Scan on test1 t1 (cost=0.00..395272.42 rows=8761 width=12) (actual time=0.140..542.657 rows=8761 loops=1)
SubPlan 1
-> Aggregate (cost=15.02..15.03 rows=1 width=4) (actual time=0.019..0.019 rows=1 loops=8761)
-> Bitmap Heap Scan on test2 t2 (cost=4.28..15.02 rows=1 width=4) (actual time=0.012..0.013 rows=0 loops=8761)
Recheck Cond: (test1_id = $0)
Filter: ((category)::text = 'A'::text)
-> Bitmap Index Scan on test_if_idx (cost=0.00..4.28 rows=3 width=0) (actual time=0.007..0.007 rows=3 loops=8761)
Index Cond: (test1_id = $0)
SubPlan 2
-> Aggregate (cost=15.02..15.03 rows=1 width=4) (actual time=0.019..0.019 rows=1 loops=8761)
-> Bitmap Heap Scan on test2 t2 (cost=4.28..15.02 rows=1 width=4) (actual time=0.012..0.012 rows=0 loops=8761)
Recheck Cond: (test1_id = $0)
Filter: ((category)::text = 'B'::text)
-> Bitmap Index Scan on test_if_idx (cost=0.00..4.28 rows=3 width=0) (actual time=0.006..0.006 rows=3 loops=8761)
Index Cond: (test1_id = $0)
SubPlan 3
-> Aggregate (cost=15.02..15.04 rows=1 width=8) (actual time=0.018..0.019 rows=1 loops=8761)
-> Bitmap Heap Scan on test2 t2 (cost=4.28..15.02 rows=1 width=8) (actual time=0.012..0.012 rows=0 loops=8761)
Recheck Cond: (test1_id = $0)
Filter: ((category)::text = 'C'::text)
-> Bitmap Index Scan on test_if_idx (cost=0.00..4.28 rows=3 width=0) (actual time=0.007..0.007 rows=3 loops=8761)
Index Cond: (test1_id = $0)
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
3646 次 |
| 最近记录: |