Fon*_*exn 5 postgresql performance index postgresql-performance
我在 PostgreSQL 9.6中有一个查询需要很长时间才能运行:
SELECT DISTINCT ON (gc.number)
gc.number, gc.code, ch.client_id, ch.client_parent_id
FROM client gc, client_hierarchy ch
WHERE gc.code = ch.client_id
AND gc.number in (SELECT NUMB FROM directory
WHERE ACTIVE = TRUE
AND NUMB SIMILAR TO '(ABTR|GREW|POEW)%');
Run Code Online (Sandbox Code Playgroud)
SIMILAR TO接收许多不同NUMB的作为参数。ABTR|GREW|POEW大约 3.500 多。
表定义:
CREATE TABLE directory (
id BIGINT PRIMARY KEY NOT NULL,
active BOOLEAN NOT NULL,
numb VARCHAR(8),
branch VARCHAR(20),
city VARCHAR(50),
modified_date TIMESTAMP NOT NULL,
name VARCHAR(200)
);
CREATE INDEX dir_numb_index ON directory (numb);
CREATE INDEX dir_branch_index ON bank_directory (branch);
CREATE INDEX dir_city_index ON bank_directory (city);
CREATE INDEX dir_name_index ON bank_directory (name);
CREATE TABLE client (
id BIGINT PRIMARY KEY NOT NULL,
active BOOLEAN NOT NULL,
number VARCHAR(8),
code VARCHAR(9) NOT NULL,
modified_date TIMESTAMP NOT NULL,
name VARCHAR(200),
);
CREATE INDEX client_number_index ON client (number);
CREATE INDEX client_code_index ON client (code);
CREATE INDEX client_name_index ON client (name);
CREATE TABLE client_hierarchy (
id BIGINT PRIMARY KEY NOT NULL,
active BOOLEAN NOT NULL,
client_id VARCHAR(9),
client_parent_id VARCHAR(9),
modified_date TIMESTAMP NOT NULL
);
CREATE INDEX client_id_hierarchy_index ON client_hierarchy (client_id);
CREATE INDEX client_parent_id_hierarchy_index ON client_hierarchy (client_parent_id);
Run Code Online (Sandbox Code Playgroud)
表client有 5K 行和client_hierarchy900K 行。
的输出EXPLAIN (ANALYZE, BUFFERS):
Unique (cost=21869.20..21890.98 rows=3760 width=23) (actual time=3117.734..3118.110 rows=707 loops=1)
Buffers: shared hit=7786
-> Sort (cost=21869.20..21880.09 rows=4357 width=23) (actual time=3117.733..3117.869 rows=998 loops=1)
Sort Key: gc.number
Sort Method: quicksort Memory: 102kB
Buffers: shared hit=7786
-> Hash Semi Join (cost=1179.86..21605.84 rows=4357 width=23) (actual time=2825.705..3115.586 rows=998 loops=1)
Hash Cond: ((gc.number)::text = (directory.numb)::text)
Buffers: shared hit=7786
-> Hash Join (cost=158.03..20524.10 rows=4357 width=23) (actual time=2.101..311.200 rows=3777 loops=1)
Hash Cond: ((ch.client_id)::text = (gc.code)::text)
Buffers: shared hit=7280
-> Seq Scan on client_hierarchy ch (cost=0.00..15955.00 rows=873500 width=10) (actual time=0.008..161.576 rows=873349 loops=1)
Filter: active
Buffers: shared hit=7220
-> Hash (cost=103.57..103.57 rows=4357 width=13) (actual time=1.969..1.969 rows=4391 loops=1)
Buckets: 8192 Batches: 1 Memory Usage: 262kB
Buffers: shared hit=60
-> Seq Scan on gbo_client gc (cost=0.00..103.57 rows=4357 width=13) (actual time=0.004..1.016 rows=4391 loops=1)
Filter: active
Buffers: shared hit=60
-> Hash (cost=953.95..953.95 rows=5430 width=9) (actual time=2803.112..2803.112 rows=5563 loops=1)
Buckets: 8192 Batches: 1 Memory Usage: 287kB
Buffers: shared hit=506
-> Seq Scan on directory (cost=0.00..953.95 rows=5430 width=9) (actual time=1.353..2800.528 rows=5563 loops=1)
Filter: (active AND ((numb)::text ~ '^(?:(?:AACM|AACO).*)$'::text))
Rows Removed by Filter: 30318
Buffers: shared hit=506
Planning time: 8.529 ms
Execution time: 3118.224 ms
Run Code Online (Sandbox Code Playgroud)
不需要保证顺序,只需要删除重复的行 DISTINCT ON,因为是同一个客户端......它们是脏数据。
有没有可能优化这个?
根据这里的反馈,我的新查询:
SELECT client.numb, client.code, ch.client_id, ch.client_parent_id
FROM (
SELECT DISTINCT ON (gc.number) gc.number, gc.code
FROM client gc
WHERE EXISTS (
SELECT *
FROM client_hierarchy ch
WHERE ch.client_id = gc.code
)
AND EXISTS (
SELECT *
FROM bank_directory
WHERE active AND numb = gc.number
)
AND numb ~ '^(?:AACM|AACO)'
) as client
INNER JOIN client_hierarchy ch ON client.code = ch.client_id;
Run Code Online (Sandbox Code Playgroud)
新EXPLAIN (ANALYZE, BUFFERS):
Nested Loop (cost=11080.16..20448.74 rows=1251 width=23) (actual time=368.303..373.574 rows=707 loops=1)
Buffers: shared hit=8236
-> Unique (cost=11079.73..11086.15 rows=1251 width=13) (actual time=368.286..368.684 rows=707 loops=1)
Buffers: shared hit=5402
-> Sort (cost=11079.73..11082.94 rows=1284 width=13) (actual time=368.285..368.410 rows=998 loops=1)
Sort Key: gc.number
Sort Method: quicksort Memory: 71kB
Buffers: shared hit=5402
-> Hash Semi Join (cost=1312.74..11013.44 rows=1284 width=13) (actual time=19.826..366.642 rows=998 loops=1)
Hash Cond: ((gc.number)::text = (directory.numb)::text)
Buffers: shared hit=5402
-> Nested Loop Semi Join (cost=0.42..9683.47 rows=1284 width=13) (actual time=0.987..346.895 rows=1070 loops=1)
Buffers: shared hit=4896
-> Seq Scan on client gc (cost=0.00..114.46 rows=1284 width=13) (actual time=0.857..327.979 rows=1252 loops=1)
Filter: ((numb)::text ~ '^(?:AACM|AACO)'::text)
Rows Removed by Filter: 3139
Buffers: shared hit=60
-> Index Only Scan using client_id_hierarchy_index on client_hierarchy ch_1 (cost=0.42..7.44 rows=1 width=5) (actual time=0.014..0.014 rows=1 loops=1252)
Index Cond: (client_id = (gc.code)::text)
Heap Fetches: 1070
Buffers: shared hit=4836
-> Hash (cost=864.36..864.36 rows=35836 width=9) (actual time=18.790..18.790 rows=35881 loops=1)
Buckets: 65536 Batches: 1 Memory Usage: 1949kB
Buffers: shared hit=506
-> Seq Scan on directory (cost=0.00..864.36 rows=35836 width=9) (actual time=0.010..11.200 rows=35881 loops=1)
Filter: active
Buffers: shared hit=506
-> Index Scan using client_id_hierarchy_index on client_hierarchy ch (cost=0.42..7.46 rows=1 width=10) (actual time=0.006..0.006 rows=1 loops=707)
Index Cond: ((client_id)::text = (gc.code)::text)
Buffers: shared hit=2834
Planning time: 63.201 ms
Execution time: 373.721 ms
Run Code Online (Sandbox Code Playgroud)
现在我在 300-400 毫秒内得到结果。从 3.5 秒下降。
我们实际上没有查询的选择性和成本来优化它。您需要EXPLAIN (ANALYZE, BUFFERS)像评论中提到的那样。说了这么多,有几点..
WHERE ACTIVE = TRUE正在触发 seq 扫描(WHERE active为了清楚起见,您也可以这样说)。您可以在活动上添加索引,这可能会产生更快的结果。EXISTS方式,使用哪个查询可能会更快。numbnumb, 可能适合将 变成enum更快。DISTINCT ON。您可能会发现在这里进行两个现有测试会更快。像这样
SELECT gc.number, gc.code, ch.client_id, ch.client_parent_id
FROM client gc
WHERE EXISTS (
SELECT *
FROM client_hierarchy ch
WHERE ch.client_id = gc.code
)
AND EXISTS (
SELECT *
FROM directory
WHERE active
AND numb = gc.number
AND numb SIMILAR TO '(ABTR|GREW|POEW)%'
);
Run Code Online (Sandbox Code Playgroud)
而且,这样写就很明显,这很尴尬,可以推下来。
AND numb = gc.number
AND numb SIMILAR TO '(ABTR|GREW|POEW)%'
Run Code Online (Sandbox Code Playgroud)
进入这个..
SELECT gc.number, gc.code, ch.client_id, ch.client_parent_id
FROM client gc
WHERE EXISTS (
SELECT *
FROM client_hierarchy ch
WHERE ch.client_id = gc.code
)
AND EXISTS (
SELECT *
FROM directory
WHERE active
AND numb = gc.number
)
AND number SIMILAR TO '(ABTR|GREW|POEW)%';
Run Code Online (Sandbox Code Playgroud)
最后SIMILAR TO可能永远不应该使用。有关更多信息,请参阅 Erwin 的这篇精彩文章。
| 归档时间: |
|
| 查看次数: |
123 次 |
| 最近记录: |