多列索引和 OR 子句

bal*_*teo 9 postgresql performance index postgresql-performance

我的message应用程序中有一个表,该表上最常执行的 SQL 查询之一如下:

select * from message message0_ 
 where message0_.sender_id=? or message0_.recipient_id=?
 order by message0_.send_date asc;
Run Code Online (Sandbox Code Playgroud)

我基本上是在查询当前用户已收到或发送的消息。

sender_id&的参数值recipient_id是当前用户 ID。

我创建了以下索引:

CREATE INDEX ON message(recipient_id, sender_id);
Run Code Online (Sandbox Code Playgroud)

我想知道字段的顺序是否对我的用例很重要,请记住我的 SQL 查询中的 OR 子句。

有人可以帮忙吗?

joa*_*olo 8

对于您的用例,您实际上需要两个不同的索引:

CREATE INDEX ON messages(recipient_id);
CREATE INDEX ON messages(sender_id);
Run Code Online (Sandbox Code Playgroud)

您的多列索引将被recipient_id = ??查询的一部分使用,但不会被另一个使用(它可以被其他知道如何执行跳过扫描的数据库使用,例如 Oracle,尽管它可能不是非常高效的)。

根据您PRIMARY KEYmessage(s)表的使用方式,您可能不需要其中之一(与 PK 关联的隐式索引将完成这项工作)。

然后,您可以使用您的查询as is,并让 PostgreSQL 执行 BitmapOr,或者将查询转换为 aUNION并避免OR. 在这两种情况下,时间应该大致相同。如果您确定不存在 thesender_id和 therecipient_id相同的消息(即:没有人向他/她发送消息,或者如果他/她发送了消息,您不介意显示两次),则 a UNION ALL(稍微快一点)可以用相同的结果代替。

OR大多数数据库往往不会很好地处理条件。这种特殊情况,比较常见,也比较简单,PostgreSQL 处理得很好。


实验检查

表定义(简化)

CREATE TABLE users
(
    user_id integer PRIMARY KEY,
    user_name text 
) ;

CREATE TABLE messages
(
    sender_id    integer NOT NULL REFERENCES users(user_id) 
                 ON UPDATE CASCADE ON DELETE RESTRICT,
    recipient_id integer NOT NULL REFERENCES users(user_id) 
                 ON UPDATE CASCADE ON DELETE RESTRICT,
    send_date    timestamp NOT NULL DEFAULT now(),
    message_text text,
    PRIMARY KEY(sender_id, recipient_id, send_date)
) ;
CREATE INDEX ON messages (recipient_id) ;
-- Following index not needed, given our primary key
-- CREATE INDEX ON messages (sender_id) ;
Run Code Online (Sandbox Code Playgroud)

我们用一些模拟数据填充表格:

-- Create 1000 users
INSERT INTO users (user_id, user_name)
SELECT
    user_id, 'user' || user_id AS user_name
FROM
    generate_series(1, 1000) AS x(user_id) ;

-- Create (aprox.) 100000 messages
INSERT INTO messages (sender_id, recipient_id, send_date)
SELECT
    (random()*999+1)::integer AS sender_id,
    (random()*999+1)::integer AS recipient_id,
    send_date
FROM
    generate_series(timestamp '2017-01-01', timestamp '2017-01-31', interval '25 seconds') AS x(send_date); 

ANALYZE;
Run Code Online (Sandbox Code Playgroud)

并且,在这一点上,我们检查哪些是我们得到的执行计划:

-- Checking the query with OR
EXPLAIN ANALYZE
SELECT * 
FROM messages
WHERE recipient_id = 123 OR sender_id = 123
ORDER BY send_date ;
Run Code Online (Sandbox Code Playgroud)

给...

QUERY PLAN
1   Sort  (cost=420.21..420.71 rows=200 width=48) (actual time=0.430..0.445 rows=192 loops=1)
2     Sort Key: send_date
3     Sort Method: quicksort  Memory: 34kB
4     ->  Bitmap Heap Scan on messages  (cost=10.31..412.56 rows=200 width=48) (actual time=0.092..0.389 rows=192 loops=1)
5           Recheck Cond: ((recipient_id = 123) OR (sender_id = 123))
6           Heap Blocks: exact=157
7           ->  BitmapOr  (cost=10.31..10.31 rows=200 width=0) (actual time=0.061..0.061 rows=0 loops=1)
8                 ->  Bitmap Index Scan on messages_recipient_id_idx  (cost=0.00..5.04 rows=100 width=0) (actual time=0.033..0.033 rows=97 loops=1)
9                       Index Cond: (recipient_id = 123)
10                ->  Bitmap Index Scan on messages_pkey  (cost=0.00..5.17 rows=100 width=0) (actual time=0.025..0.025 rows=95 loops=1)
11                      Index Cond: (sender_id = 123)
12  Planning time: 0.379 ms
13  Execution time: 0.527 ms
Run Code Online (Sandbox Code Playgroud)

并使用UNION

-- Not using OR, but UNION-ing
EXPLAIN ANALYZE
SELECT * 
FROM messages
WHERE recipient_id = 123
UNION
SELECT *
FROM messages
WHERE sender_id = 123
ORDER BY send_date ;
Run Code Online (Sandbox Code Playgroud)

你得到以下查询计划:

QUERY PLAN
1   Sort  (cost=538.87..539.37 rows=200 width=48) (actual time=0.387..0.399 rows=192 loops=1)
2     Sort Key: messages.send_date
3     Sort Method: quicksort  Memory: 34kB
4     ->  HashAggregate  (cost=529.22..531.22 rows=200 width=48) (actual time=0.268..0.317 rows=192 loops=1)
5           Group Key: messages.sender_id, messages.recipient_id, messages.send_date, messages.message_text
6           ->  Append  (cost=5.07..527.22 rows=200 width=48) (actual time=0.038..0.192 rows=192 loops=1)
7                 ->  Bitmap Heap Scan on messages  (cost=5.07..262.55 rows=100 width=48) (actual time=0.038..0.094 rows=97 loops=1)
8                       Recheck Cond: (recipient_id = 123)
9                       Heap Blocks: exact=89
10                      ->  Bitmap Index Scan on messages_recipient_id_idx  (cost=0.00..5.04 rows=100 width=0) (actual time=0.022..0.022 rows=97 loops=1)
11                            Index Cond: (recipient_id = 123)
12                ->  Bitmap Heap Scan on messages messages_1  (cost=5.19..262.67 rows=100 width=48) (actual time=0.033..0.085 rows=95 loops=1)
13                      Recheck Cond: (sender_id = 123)
14                      Heap Blocks: exact=86
15                      ->  Bitmap Index Scan on messages_pkey  (cost=0.00..5.17 rows=100 width=0) (actual time=0.021..0.021 rows=95 loops=1)
16                            Index Cond: (sender_id = 123)
17  Planning time: 0.178 ms
18  Execution time: 0.521 ms
Run Code Online (Sandbox Code Playgroud)

为了完整起见UNION ALL::

    QUERY PLAN
1   Sort  (cost=537.11..537.62 rows=201 width=48) (actual time=0.213..0.229 rows=160 loops=1)
2     Sort Key: messages.send_date
3     Sort Method: quicksort  Memory: 32kB
4     ->  Append  (cost=5.08..529.42 rows=201 width=48) (actual time=0.034..0.173 rows=160 loops=1)
5           ->  Bitmap Heap Scan on messages  (cost=5.08..264.74 rows=101 width=48) (actual time=0.034..0.086 rows=82 loops=1)
6                 Recheck Cond: (recipient_id = 123)
7                 Heap Blocks: exact=77
8                 ->  Bitmap Index Scan on messages_recipient_id_idx  (cost=0.00..5.05 rows=101 width=0) (actual time=0.022..0.022 rows=82 loops=1)
9                       Index Cond: (recipient_id = 123)
10          ->  Bitmap Heap Scan on messages messages_1  (cost=5.19..262.67 rows=100 width=48) (actual time=0.030..0.075 rows=78 loops=1)
11                Recheck Cond: (sender_id = 123)
12                Heap Blocks: exact=72
13                ->  Bitmap Index Scan on messages_pkey  (cost=0.00..5.17 rows=100 width=0) (actual time=0.020..0.020 rows=78 loops=1)
14                      Index Cond: (sender_id = 123)
15  Planning time: 0.222 ms
16  Execution time: 0.280 ms
Run Code Online (Sandbox Code Playgroud)

您可以看到,在这两种情况下,两种第一种方法的执行计划几乎相同,而 UNION ALL 的执行计划更好。

您可以在 Rexester 上查看所有这些

您的原始设置

如果你做的完全一样但没有CREATE INDEX ON messages (recipient_id) ;声明,你会得到的是:

QUERY PLAN
1   Sort  (cost=2123.86..2124.36 rows=200 width=48) (actual time=27.305..27.320 rows=227 loops=1)
2     Sort Key: send_date
3     Sort Method: quicksort  Memory: 35kB
4     ->  Seq Scan on messages  (cost=0.00..2116.22 rows=200 width=48) (actual time=0.108..27.097 rows=227 loops=1)
5           Filter: ((recipient_id = 123) OR (sender_id = 123))
6           Rows Removed by Filter: 103454
7   Planning time: 0.474 ms
8   Execution time: 27.392 ms

QUERY PLAN
1   Sort  (cost=2135.60..2136.10 rows=201 width=48) (actual time=15.716..15.803 rows=227 loops=1)
2     Sort Key: messages.send_date
3     Sort Method: quicksort  Memory: 35kB
4     ->  HashAggregate  (cost=2125.90..2127.91 rows=201 width=48) (actual time=15.571..15.621 rows=227 loops=1)
5           Group Key: messages.sender_id, messages.recipient_id, messages.send_date, messages.message_text
6           ->  Append  (cost=0.00..2123.89 rows=201 width=48) (actual time=0.061..15.371 rows=227 loops=1)
7                 ->  Seq Scan on messages  (cost=0.00..1857.01 rows=100 width=48) (actual time=0.060..15.092 rows=117 loops=1)
8                       Filter: (recipient_id = 123)
9                       Rows Removed by Filter: 103564
10                ->  Bitmap Heap Scan on messages messages_1  (cost=5.20..264.87 rows=101 width=48) (actual time=0.086..0.248 rows=110 loops=1)
11                      Recheck Cond: (sender_id = 123)
12                      Heap Blocks: exact=101
13                      ->  Bitmap Index Scan on messages_pkey  (cost=0.00..5.18 rows=101 width=0) (actual time=0.066..0.066 rows=110 loops=1)
14                            Index Cond: (sender_id = 123)
15  Planning time: 0.333 ms
16  Execution time: 16.006 ms

QUERY PLAN
1   Sort  (cost=2131.58..2132.08 rows=201 width=48) (actual time=14.847..14.865 rows=227 loops=1)
2     Sort Key: messages.send_date
3     Sort Method: quicksort  Memory: 35kB
4     ->  Append  (cost=0.00..2123.89 rows=201 width=48) (actual time=0.076..14.731 rows=227 loops=1)
5           ->  Seq Scan on messages  (cost=0.00..1857.01 rows=100 width=48) (actual time=0.076..14.497 rows=117 loops=1)
6                 Filter: (recipient_id = 123)
7                 Rows Removed by Filter: 103564
8           ->  Bitmap Heap Scan on messages messages_1  (cost=5.20..264.87 rows=101 width=48) (actual time=0.082..0.209 rows=110 loops=1)
9                 Recheck Cond: (sender_id = 123)
10                Heap Blocks: exact=101
11                ->  Bitmap Index Scan on messages_pkey  (cost=0.00..5.18 rows=101 width=0) (actual time=0.060..0.060 rows=110 loops=1)
12                      Index Cond: (sender_id = 123)
13  Planning time: 0.284 ms
14  Execution time: 14.931 ms
Run Code Online (Sandbox Code Playgroud)

...这是更糟糕的查询计划,因为您需要一些顺序扫描。您也可以在这个 Rexester 上检查这种方法。