Mor*_*ras 8 performance index postgresql-9.6 query-performance
假设我有一个包含 3 列 a、b 和 c 的表。
我可以通过使用索引来加速看起来像这样的查询吗??
SELECT a,b,SUM(c) # or AVG(c)
FROM table
GROUP BY a,b
ORDER BY a,b
;
Run Code Online (Sandbox Code Playgroud)
如果上述问题是肯定的,您推荐什么类型的索引,这将如何工作?
不见得。GROUP BY并且ORDER BY通常需要排序。但是,在这种情况下使用了 a HashAggregate(可能是因为我们正在处理整个表)。
CREATE TABLE foo AS
SELECT x % 5 AS a, x % 10 AS b, x AS c
FROM generate_series(1,1e6) AS x;
Run Code Online (Sandbox Code Playgroud)
使用 HashAggregate 计划,
# EXPLAIN ANALYZE SELECT a,b,sum(c) FROM foo GROUP BY a,b ORDER BY a,b;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Sort (cost=23668.04..23668.16 rows=50 width=14) (actual time=611.607..611.608 rows=10 loops=1)
Sort Key: a, b
Sort Method: quicksort Memory: 25kB
-> HashAggregate (cost=23666.00..23666.62 rows=50 width=14) (actual time=611.589..611.593 rows=10 loops=1)
Group Key: a, b
-> Seq Scan on foo (cost=0.00..16166.00 rows=1000000 width=14) (actual time=0.012..71.157 rows=1000000 loops=1)
Planning time: 0.168 ms
Execution time: 611.665 ms
Run Code Online (Sandbox Code Playgroud)
所以我们添加一个索引...
CREATE INDEX idx ON foo (a,b);
VACUUM FULL ANALYZE foo;
Run Code Online (Sandbox Code Playgroud)
...仍然显示相同的查询计划。所以我们禁用 HashAggregate
SET enable_hashagg = false;
Run Code Online (Sandbox Code Playgroud)
然后再试一次..
# EXPLAIN ANALYZE SELECT a,b,sum(c) FROM foo GROUP BY a,b ORDER BY a,b;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=0.42..61292.04 rows=50 width=14) (actual time=108.149..655.536 rows=10 loops=1)
Group Key: a, b
-> Index Scan using idx on foo (cost=0.42..53791.41 rows=1000000 width=14) (actual time=0.066..272.299 rows=1000000 loops=1)
Planning time: 0.121 ms
Execution time: 655.594 ms
(5 rows)
Run Code Online (Sandbox Code Playgroud)
而且,与之前的 611ms 相比,它需要更多的时间 655ms。
如果这还不够快(并且 611 毫秒来对一百万行进行分组和求和也不错)。然后,MATERIALIZED VIEW如果您的工作负载允许,您可以使用 a (如果查询很热和/或很少更新),
CREATE MATERIALIZED VIEW foo2 AS
SELECT a,b,sum(c)
FROM foo
GROUP BY a,b
ORDER BY a,b;
Run Code Online (Sandbox Code Playgroud)
现在,当TABLE foo2. 然后只需REFRESH MATERIALIZED VIEW foo2;刷新视图即可。或者,您可以创建触发器更新另一个表并使用触发器更新它。
有一些例外,但sum()不是其中之一。大多数聚合不使用索引,因为它们通常不需要索引。例外是特定于订单的聚合(如min()和max())。因此,例如,如果在我们为(a,b)您创建索引后运行一个sum(a),
# EXPLAIN ANALYZE SELECT sum(a) FROM foo;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Aggregate (cost=18666.00..18666.01 rows=1 width=4) (actual time=287.063..287.063 rows=1 loops=1)
-> Seq Scan on foo (cost=0.00..16166.00 rows=1000000 width=4) (actual time=0.015..85.435 rows=1000000 loops=1)
Planning time: 0.098 ms
Execution time: 287.104 ms
(4 rows)
Run Code Online (Sandbox Code Playgroud)
您可以看到它仍然使用 seq 扫描。您将看到sum(c)完全没有索引的相同计划。现在是踢球者,
# EXPLAIN ANALYZE SELECT min(a) FROM foo;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Result (cost=0.48..0.49 rows=1 width=0) (actual time=0.041..0.041 rows=1 loops=1)
InitPlan 1 (returns $0)
-> Limit (cost=0.42..0.48 rows=1 width=4) (actual time=0.036..0.037 rows=1 loops=1)
-> Index Only Scan using idx on foo (cost=0.42..56291.41 rows=1000000 width=4) (actual time=0.035..0.035 rows=1 loops=1)
Index Cond: (a IS NOT NULL)
Heap Fetches: 1
Planning time: 0.171 ms
Execution time: 0.080 ms
(8 rows)
Run Code Online (Sandbox Code Playgroud)
min(a)不同的是sum(a)可以使用排序,因此查询计划器意识到索引扫描不是免费的,它有好处。
无论出于何种原因,如果您想看到进一步的索引c对于求和的目的无关紧要的证据(如果在阅读上述内容后您仍然不明白为什么,请提出问题),
-- turn this back on we turned it off earlier
SET enable_hashagg = true;
DROP INDEX idx;
CREATE INDEX idx ON foo (a,b,c);
VACUUM FULL ANALYZE foo;
EXPLAIN ANALYZE SELECT a,b,sum(c) FROM foo GROUP BY a,b ORDER BY a,b;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Sort (cost=23668.04..23668.16 rows=50 width=14) (actual time=608.888..608.889 rows=10 loops=1)
Sort Key: a, b
Sort Method: quicksort Memory: 25kB
-> HashAggregate (cost=23666.00..23666.62 rows=50 width=14) (actual time=608.869..608.871 rows=10 loops=1)
Group Key: a, b
-> Seq Scan on foo (cost=0.00..16166.00 rows=1000000 width=14) (actual time=0.015..72.613 rows=1000000 loops=1)
Planning time: 0.130 ms
Execution time: 608.947 ms
(8 rows)
Run Code Online (Sandbox Code Playgroud)
没有任何改善。禁用hashagg仍然没有任何改善。
在这个特定且简单的用例中,索引无关紧要。计划者选择最好的方法。
| 归档时间: |
|
| 查看次数: |
11805 次 |
| 最近记录: |