bal*_*teo 9 postgresql performance index postgresql-performance
我的message
应用程序中有一个表,该表上最常执行的 SQL 查询之一如下:
select * from message message0_
where message0_.sender_id=? or message0_.recipient_id=?
order by message0_.send_date asc;
Run Code Online (Sandbox Code Playgroud)
我基本上是在查询当前用户已收到或发送的消息。
sender_id
&的参数值recipient_id
是当前用户 ID。
我创建了以下索引:
CREATE INDEX ON message(recipient_id, sender_id);
Run Code Online (Sandbox Code Playgroud)
我想知道字段的顺序是否对我的用例很重要,请记住我的 SQL 查询中的 OR 子句。
有人可以帮忙吗?
对于您的用例,您实际上需要两个不同的索引:
CREATE INDEX ON messages(recipient_id);
CREATE INDEX ON messages(sender_id);
Run Code Online (Sandbox Code Playgroud)
您的多列索引将被recipient_id = ??
查询的一部分使用,但不会被另一个使用(它可以被其他知道如何执行跳过扫描的数据库使用,例如 Oracle,尽管它可能不是非常高效的)。
根据您PRIMARY KEY
对message(s)
表的使用方式,您可能不需要其中之一(与 PK 关联的隐式索引将完成这项工作)。
然后,您可以使用您的查询as is
,并让 PostgreSQL 执行 BitmapOr,或者将查询转换为 aUNION
并避免OR
. 在这两种情况下,时间应该大致相同。如果您确定不存在 thesender_id
和 therecipient_id
相同的消息(即:没有人向他/她发送消息,或者如果他/她发送了消息,您不介意显示两次),则 a UNION ALL
(稍微快一点)可以用相同的结果代替。
OR
大多数数据库往往不会很好地处理条件。这种特殊情况,比较常见,也比较简单,PostgreSQL 处理得很好。
表定义(简化)
CREATE TABLE users
(
user_id integer PRIMARY KEY,
user_name text
) ;
CREATE TABLE messages
(
sender_id integer NOT NULL REFERENCES users(user_id)
ON UPDATE CASCADE ON DELETE RESTRICT,
recipient_id integer NOT NULL REFERENCES users(user_id)
ON UPDATE CASCADE ON DELETE RESTRICT,
send_date timestamp NOT NULL DEFAULT now(),
message_text text,
PRIMARY KEY(sender_id, recipient_id, send_date)
) ;
CREATE INDEX ON messages (recipient_id) ;
-- Following index not needed, given our primary key
-- CREATE INDEX ON messages (sender_id) ;
Run Code Online (Sandbox Code Playgroud)
我们用一些模拟数据填充表格:
-- Create 1000 users
INSERT INTO users (user_id, user_name)
SELECT
user_id, 'user' || user_id AS user_name
FROM
generate_series(1, 1000) AS x(user_id) ;
-- Create (aprox.) 100000 messages
INSERT INTO messages (sender_id, recipient_id, send_date)
SELECT
(random()*999+1)::integer AS sender_id,
(random()*999+1)::integer AS recipient_id,
send_date
FROM
generate_series(timestamp '2017-01-01', timestamp '2017-01-31', interval '25 seconds') AS x(send_date);
ANALYZE;
Run Code Online (Sandbox Code Playgroud)
并且,在这一点上,我们检查哪些是我们得到的执行计划:
-- Checking the query with OR
EXPLAIN ANALYZE
SELECT *
FROM messages
WHERE recipient_id = 123 OR sender_id = 123
ORDER BY send_date ;
Run Code Online (Sandbox Code Playgroud)
给...
QUERY PLAN
1 Sort (cost=420.21..420.71 rows=200 width=48) (actual time=0.430..0.445 rows=192 loops=1)
2 Sort Key: send_date
3 Sort Method: quicksort Memory: 34kB
4 -> Bitmap Heap Scan on messages (cost=10.31..412.56 rows=200 width=48) (actual time=0.092..0.389 rows=192 loops=1)
5 Recheck Cond: ((recipient_id = 123) OR (sender_id = 123))
6 Heap Blocks: exact=157
7 -> BitmapOr (cost=10.31..10.31 rows=200 width=0) (actual time=0.061..0.061 rows=0 loops=1)
8 -> Bitmap Index Scan on messages_recipient_id_idx (cost=0.00..5.04 rows=100 width=0) (actual time=0.033..0.033 rows=97 loops=1)
9 Index Cond: (recipient_id = 123)
10 -> Bitmap Index Scan on messages_pkey (cost=0.00..5.17 rows=100 width=0) (actual time=0.025..0.025 rows=95 loops=1)
11 Index Cond: (sender_id = 123)
12 Planning time: 0.379 ms
13 Execution time: 0.527 ms
Run Code Online (Sandbox Code Playgroud)
并使用UNION
:
-- Not using OR, but UNION-ing
EXPLAIN ANALYZE
SELECT *
FROM messages
WHERE recipient_id = 123
UNION
SELECT *
FROM messages
WHERE sender_id = 123
ORDER BY send_date ;
Run Code Online (Sandbox Code Playgroud)
你得到以下查询计划:
QUERY PLAN
1 Sort (cost=538.87..539.37 rows=200 width=48) (actual time=0.387..0.399 rows=192 loops=1)
2 Sort Key: messages.send_date
3 Sort Method: quicksort Memory: 34kB
4 -> HashAggregate (cost=529.22..531.22 rows=200 width=48) (actual time=0.268..0.317 rows=192 loops=1)
5 Group Key: messages.sender_id, messages.recipient_id, messages.send_date, messages.message_text
6 -> Append (cost=5.07..527.22 rows=200 width=48) (actual time=0.038..0.192 rows=192 loops=1)
7 -> Bitmap Heap Scan on messages (cost=5.07..262.55 rows=100 width=48) (actual time=0.038..0.094 rows=97 loops=1)
8 Recheck Cond: (recipient_id = 123)
9 Heap Blocks: exact=89
10 -> Bitmap Index Scan on messages_recipient_id_idx (cost=0.00..5.04 rows=100 width=0) (actual time=0.022..0.022 rows=97 loops=1)
11 Index Cond: (recipient_id = 123)
12 -> Bitmap Heap Scan on messages messages_1 (cost=5.19..262.67 rows=100 width=48) (actual time=0.033..0.085 rows=95 loops=1)
13 Recheck Cond: (sender_id = 123)
14 Heap Blocks: exact=86
15 -> Bitmap Index Scan on messages_pkey (cost=0.00..5.17 rows=100 width=0) (actual time=0.021..0.021 rows=95 loops=1)
16 Index Cond: (sender_id = 123)
17 Planning time: 0.178 ms
18 Execution time: 0.521 ms
Run Code Online (Sandbox Code Playgroud)
为了完整起见UNION ALL
::
QUERY PLAN
1 Sort (cost=537.11..537.62 rows=201 width=48) (actual time=0.213..0.229 rows=160 loops=1)
2 Sort Key: messages.send_date
3 Sort Method: quicksort Memory: 32kB
4 -> Append (cost=5.08..529.42 rows=201 width=48) (actual time=0.034..0.173 rows=160 loops=1)
5 -> Bitmap Heap Scan on messages (cost=5.08..264.74 rows=101 width=48) (actual time=0.034..0.086 rows=82 loops=1)
6 Recheck Cond: (recipient_id = 123)
7 Heap Blocks: exact=77
8 -> Bitmap Index Scan on messages_recipient_id_idx (cost=0.00..5.05 rows=101 width=0) (actual time=0.022..0.022 rows=82 loops=1)
9 Index Cond: (recipient_id = 123)
10 -> Bitmap Heap Scan on messages messages_1 (cost=5.19..262.67 rows=100 width=48) (actual time=0.030..0.075 rows=78 loops=1)
11 Recheck Cond: (sender_id = 123)
12 Heap Blocks: exact=72
13 -> Bitmap Index Scan on messages_pkey (cost=0.00..5.17 rows=100 width=0) (actual time=0.020..0.020 rows=78 loops=1)
14 Index Cond: (sender_id = 123)
15 Planning time: 0.222 ms
16 Execution time: 0.280 ms
Run Code Online (Sandbox Code Playgroud)
您可以看到,在这两种情况下,两种第一种方法的执行计划几乎相同,而 UNION ALL 的执行计划更好。
如果你做的完全一样但没有CREATE INDEX ON messages (recipient_id) ;
声明,你会得到的是:
QUERY PLAN
1 Sort (cost=2123.86..2124.36 rows=200 width=48) (actual time=27.305..27.320 rows=227 loops=1)
2 Sort Key: send_date
3 Sort Method: quicksort Memory: 35kB
4 -> Seq Scan on messages (cost=0.00..2116.22 rows=200 width=48) (actual time=0.108..27.097 rows=227 loops=1)
5 Filter: ((recipient_id = 123) OR (sender_id = 123))
6 Rows Removed by Filter: 103454
7 Planning time: 0.474 ms
8 Execution time: 27.392 ms
QUERY PLAN
1 Sort (cost=2135.60..2136.10 rows=201 width=48) (actual time=15.716..15.803 rows=227 loops=1)
2 Sort Key: messages.send_date
3 Sort Method: quicksort Memory: 35kB
4 -> HashAggregate (cost=2125.90..2127.91 rows=201 width=48) (actual time=15.571..15.621 rows=227 loops=1)
5 Group Key: messages.sender_id, messages.recipient_id, messages.send_date, messages.message_text
6 -> Append (cost=0.00..2123.89 rows=201 width=48) (actual time=0.061..15.371 rows=227 loops=1)
7 -> Seq Scan on messages (cost=0.00..1857.01 rows=100 width=48) (actual time=0.060..15.092 rows=117 loops=1)
8 Filter: (recipient_id = 123)
9 Rows Removed by Filter: 103564
10 -> Bitmap Heap Scan on messages messages_1 (cost=5.20..264.87 rows=101 width=48) (actual time=0.086..0.248 rows=110 loops=1)
11 Recheck Cond: (sender_id = 123)
12 Heap Blocks: exact=101
13 -> Bitmap Index Scan on messages_pkey (cost=0.00..5.18 rows=101 width=0) (actual time=0.066..0.066 rows=110 loops=1)
14 Index Cond: (sender_id = 123)
15 Planning time: 0.333 ms
16 Execution time: 16.006 ms
QUERY PLAN
1 Sort (cost=2131.58..2132.08 rows=201 width=48) (actual time=14.847..14.865 rows=227 loops=1)
2 Sort Key: messages.send_date
3 Sort Method: quicksort Memory: 35kB
4 -> Append (cost=0.00..2123.89 rows=201 width=48) (actual time=0.076..14.731 rows=227 loops=1)
5 -> Seq Scan on messages (cost=0.00..1857.01 rows=100 width=48) (actual time=0.076..14.497 rows=117 loops=1)
6 Filter: (recipient_id = 123)
7 Rows Removed by Filter: 103564
8 -> Bitmap Heap Scan on messages messages_1 (cost=5.20..264.87 rows=101 width=48) (actual time=0.082..0.209 rows=110 loops=1)
9 Recheck Cond: (sender_id = 123)
10 Heap Blocks: exact=101
11 -> Bitmap Index Scan on messages_pkey (cost=0.00..5.18 rows=101 width=0) (actual time=0.060..0.060 rows=110 loops=1)
12 Index Cond: (sender_id = 123)
13 Planning time: 0.284 ms
14 Execution time: 14.931 ms
Run Code Online (Sandbox Code Playgroud)
...这是更糟糕的查询计划,因为您需要一些顺序扫描。您也可以在这个 Rexester 上检查这种方法。
归档时间: |
|
查看次数: |
2902 次 |
最近记录: |