Div*_*ick 5 postgresql performance postgresql-performance
我正在使用 Postgres,我看到在两列上使用 order by 时,与仅在列上使用 order by 相比,我的查询慢了几个数量级。我在考虑的表中有大约 2950 万行。
以下是三个不同查询的结果:
仅按 id 排序:
EXPLAIN ANALYZE SELECT "api_meterdata"."id", "api_meterdata"."meter_id", "api_meterdata"."datetime", "api_meter"."id" FROM "api_meterdata" INNER JOIN "api_meter" ON ( "api_meterdata"."meter_id" = "api_meter"."id" ) ORDER BY "api_meterdata"."id" DESC LIMIT 100;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.44..321.49 rows=100 width=20) (actual time=0.407..30.424 rows=100 loops=1)
-> Nested Loop (cost=0.44..94824299.30 rows=29535145 width=20) (actual time=0.402..30.090 rows=100 loops=1)
Join Filter: (api_meterdata.meter_id = api_meter.id)
Rows Removed by Join Filter: 8147
-> Index Scan Backward using api_meterdata_pkey on api_meterdata (cost=0.44..58053041.74 rows=29535145 width=16) (actual time=0.103..0.867 rows=100 loops=1)
-> Materialize (cost=0.00..2.25 rows=83 width=4) (actual time=0.002..0.144 rows=82 loops=100)
-> Seq Scan on api_meter (cost=0.00..1.83 rows=83 width=4) (actual time=0.008..0.153 rows=83 loops=1) Planning time:
0.491 ms Execution time: 30.701 ms (9 rows)
Run Code Online (Sandbox Code Playgroud)
仅在日期时间订购:
EXPLAIN ANALYZE SELECT "api_meterdata"."id", "api_meterdata"."meter_id", "api_meterdata"."datetime", "api_meter"."id" FROM "api_meterdata" INNER JOIN "api_meter" ON ( "api_meterdata"."meter_id" = "api_meter"."id" ) ORDER BY "api_meterdata"."datetime" ASC LIMIT 100;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.44..321.50 rows=100 width=20) (actual time=1.245..37.054 rows=100 loops=1)
-> Nested Loop (cost=0.44..94825493.68 rows=29535313 width=20) (actual time=1.238..36.652 rows=100 loops=1)
Join Filter: (api_meterdata.meter_id = api_meter.id)
Rows Removed by Join Filter: 8148
-> Index Scan using api_meterdata_datetime_index on api_meterdata (cost=0.44..58054026.95 rows=29535313 width=16) (actual time=0.851..1.501 rows=100 loops=1)
-> Materialize (cost=0.00..2.25 rows=83 width=4) (actual time=0.002..0.172 rows=82 loops=100)
-> Seq Scan on api_meter (cost=0.00..1.83 rows=83 width=4) (actual time=0.013..0.192 rows=83 loops=1)
Planning time: 0.483 ms
Execution time: 37.340 ms
(9 rows)
Run Code Online (Sandbox Code Playgroud)
在日期时间和 id 上使用 order by:
EXPLAIN ANALYZE SELECT "api_meterdata"."id", "api_meterdata"."meter_id", "api_meterdata"."datetime", "api_meter"."id" FROM "api_meterdata" INNER JOIN "api_meter" ON ( "api_meterdata"."meter_id" = "api_meter"."id" ) ORDER BY "api_meterdata"."datetime" ASC, "api_meterdata"."id" DESC LIMIT 100;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=3064122.28..3064122.53 rows=100 width=20) (actual time=146772.167..146772.372 rows=100 loops=1)
-> Sort (cost=3064122.28..3137955.90 rows=29533446 width=20) (actual time=146772.164..146772.242 rows=100 loops=1)
Sort Key: api_meterdata.datetime, api_meterdata.id
Sort Method: top-N heapsort Memory: 32kB
-> Hash Join (cost=2.87..1935375.21 rows=29533446 width=20) (actual time=0.394..113349.364 rows=29535544 loops=1)
Hash Cond: (api_meterdata.meter_id = api_meter.id)
-> Seq Scan on api_meterdata (cost=0.00..1529287.46 rows=29533446 width=16) (actual time=0.220..47537.991 rows=29535544 loops=1)
-> Hash (cost=1.83..1.83 rows=83 width=4) (actual time=0.160..0.160 rows=83 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 3kB
-> Seq Scan on api_meter (cost=0.00..1.83 rows=83 width=4) (actual time=0.005..0.071 rows=83 loops=1)
Planning time: 0.290 ms
Execution time: 146772.500 ms
(12 rows)
Run Code Online (Sandbox Code Playgroud)
这是表上的索引:
SELECT * FROM pg_indexes WHERE tablename = 'api_meterdata';
schemaname | tablename | indexname | tablespace | indexdef
------------+---------------+----------------------------------------------+------------+-----------------------------------------------------------------------------------------------------
---------------
public | api_meterdata | api_meterdata_meter_id_36fe63013b50049f_uniq | | CREATE UNIQUE INDEX api_meterdata_meter_id_36fe63013b50049f_uniq ON api_meterdata USING btree (meter
_id, datetime)
public | api_meterdata | api_meterdata_pkey | | CREATE UNIQUE INDEX api_meterdata_pkey ON api_meterdata USING btree (id)
public | api_meterdata | api_meterdata_f7a5de1d | | CREATE INDEX api_meterdata_f7a5de1d ON api_meterdata USING btree (meter_id)
public | api_meterdata | api_meterdata_datetime_index | | CREATE INDEX api_meterdata_datetime_index ON api_meterdata USING btree (datetime)
(4 rows)
Run Code Online (Sandbox Code Playgroud)
我可以看到它是耗时最长的排序步骤。但不知道为什么。
时间差异的原因是由于几个事实:
WHERE
筛选出要检索的结果的子句。LIMIT
子句这一事实。ORDER BY
指定的顺序检索查询的行,它将开始一个一个地选择它们,直到读取到 100(您的LIMIT
子句指定的数字。索引,如果是多列,则需要具有相同的列,以相同的顺序,并具有相同的 ASC、DESC 排序方向(或全部颠倒)。需要检索所有数据(不仅仅是已经排序的 100 行),必须加入所有数据,然后进行Sort步骤,这是导致性能差异如此大的原因。使用explain.depesz.com可以清楚地看到这一点。
在dbfiddle here 中找到您的场景的模拟,涵盖和解释了不同的情况,并考虑了来自@ypercube 的建议作为另一个索引。另请注意,您的某些索引是多余的。
您的场景的 DDL,以及一些模拟数据:
CREATE TABLE api_meter
(
id INTEGER PRIMARY KEY
) ;
INSERT INTO
api_meter
(id)
SELECT
generate_series(1, 83) ;
Run Code Online (Sandbox Code Playgroud)
...以及保存您的 meter_data 的表
CREATE TABLE api_meterdata
(
id serial /* integer */ PRIMARY KEY,
meter_id integer REFERENCES api_meter(id),
datetime timestamp NOT NULL default now()
) ;
-- The PK will have made an implicit index ON (id)
-- Index on (meter_id, datetime); which is probably the *NATURAL KEY*
CREATE UNIQUE INDEX api_meterdata_meter_id_datetime_unique
ON api_meterdata (meter_id, datetime) ;
-- The following index is redundant, the column meter_id is already the first in
-- the previous one.
-- CREATE INDEX api_meterdata_meter_id_idx
-- ON api_meterdata (meter_id) ;
CREATE INDEX api_meterdata_datetime_idx
ON api_meterdata (datetime) ;
Run Code Online (Sandbox Code Playgroud)
...一些模拟数据(648001 行,使其真实)。数据少于您拥有的数据,但如果我尝试放入更多数据,DBFiddle 就会达到极限
INSERT INTO
api_meterdata
(meter_id, datetime)
SELECT
random()*82+1, d
FROM
generate_series(timestamp '2017-01-01', timestamp '2017-01-31',
interval '4 second') AS s(d);
-- Make sure statistics are good
ANALYZE api_meterdata;
ANALYZE api_meter;
Run Code Online (Sandbox Code Playgroud)
分析您的第一个查询
-- This query doesn't have a WHERE clause, so, indexes will be used based on
-- ORDER BY + LIMIT (and, eventually, column coverage)
--
-- * The index helping this case is the one corresponding to the PK of
-- api_meter_data, used in DESC order
-- * A second index will help: the one used for the JOIN condition
-- * How does postgresql choose to JOIN will depend on specific data values
-- distribution, sizes, etc.
EXPLAIN ANALYZE
SELECT
api_meterdata.id, api_meterdata.meter_id, api_meterdata.datetime,
api_meter.id
FROM
api_meterdata
INNER JOIN api_meter ON ( api_meterdata.meter_id = api_meter.id )
ORDER BY
api_meterdata.id DESC
LIMIT 100;
Run Code Online (Sandbox Code Playgroud)
| 查询计划 | | :------------------------------------------------- -------------------------------------------------- -------------------------------------------------- ----------- | | 限制(成本=0.57..20.71行=100宽度=20)(实际时间=0.033..0.188行=100循环=1)| | -> 嵌套循环(成本=0.57..130514.61 行=648001 宽度=20)(实际时间=0.031..0.175 行=100 次循环=1)| | -> 使用 api_meterdata_pkey 对 api_meterdata 进行索引扫描(成本=0.42..20342.44 行=648001 宽度=16)(实际时间=0.023..0.038 行=100 次循环=1)| | -> Index Only Scan using api_meter_pkey on api_meter (cost=0.14..0.16 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=100) | | 索引条件:(id = api_meterdata.meter_id) | | 堆获取:100 | | 规划时间:0.331 ms | | 执行时间:0.216 ms |
第二个查询的分析
-- This query doesn't have either a WHERE clause, so, indexes will be used
-- based on ORDER BY + LIMIT (and, eventually, column coverage).
-- * The index helping this case is the one corresponding to
-- ON api_meterdata (datetime), because that's the only column used in the
-- ORDER BY.
-- * A second index will help: the one used for the JOIN condition
-- * How does postgresql choose to JOIN will depend on specific data values
-- distribution
EXPLAIN ANALYZE
SELECT
api_meterdata.id, api_meterdata.meter_id, api_meterdata.datetime,
api_meter.id
FROM
api_meterdata
INNER JOIN api_meter ON ( api_meterdata.meter_id = api_meter.id )
ORDER BY
api_meterdata.datetime ASC
LIMIT 100;
Run Code Online (Sandbox Code Playgroud)
| 查询计划 | | :------------------------------------------------- -------------------------------------------------- -------------------------------------------------- ---------- | | 限制(成本=0.57..20.71行=100宽度=20)(实际时间=0.041..0.201行=100循环=1)| | -> 嵌套循环(成本=0.57..130514.61 行=648001 宽度=20)(实际时间=0.040..0.182 行=100 次循环=1)| | -> 使用 api_meterdata_datetime_idx 对 api_meterdata 进行索引扫描 (cost=0.42..20342.44 rows=648001 width=16) (actual time=0.036..0.048 rows=100 loops=1) | | -> Index Only Scan using api_meter_pkey on api_meter (cost=0.14..0.16 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=100) | | 索引条件:(id = api_meterdata.meter_id) | | 堆获取:100 | | 规划时间:0.113 ms | | 执行时间:0.224 毫秒 |
不使用和使用建议的索引分析您的第三个查询
-- This query doesn't have either a WHERE clause.
-- Again indexes will be used based on ORDER BY + LIMIT
-- * The index that would mostly he,p this case would be one with
-- (datetime ASC, id DESC).
-- But there's not in place. An index with (datetime) will not be good enough,
-- because the second condition in ORDER BY will need to be evaluated before
-- the LIMIT can be computed. That is a SORT will be needed
-- * A second index will help: the one used for the JOIN condition
-- * How does postgresql choose to JOIN will depend on specific data values
-- distribution, as always.
--
-- This query performs MUCH WORSE than the previous one.
EXPLAIN ANALYZE
SELECT api_meterdata.id, api_meterdata.meter_id, api_meterdata.datetime, api_meter.id
FROM api_meterdata
INNER JOIN api_meter ON ( api_meterdata.meter_id = api_meter.id )
ORDER BY api_meterdata.datetime ASC, api_meterdata.id DESC
LIMIT 100;
Run Code Online (Sandbox Code Playgroud)
| 查询计划 | | :------------------------------------------------- -------------------------------------------------- ---------------------------------- | | 限制(成本=43662.02..43662.27行=100宽度=20)(实际时间=377.202..377.222行=100循环=1)| | -> Sort (cost=43662.02..45282.03 rows=648001 width=20) (实际时间=377.202..377.210 rows=100 loops=1) | | 排序键:api_meterdata.datetime, api_meterdata.id DESC | | 排序方式:top-N heapsort 内存:32kB | | -> Hash Join (cost=2.87..18895.89 rows=648001 width=20) (实际时间=0.034..270.809 rows=648001 loops=1) | | 哈希条件:(api_meterdata.meter_id = api_meter.id) | | -> api_meterdata 上的 Seq Scan (cost=0.00..9983.01 rows=648001 width=16) (actual time=0.007..75.104 rows=648001 loops=1) | | -> Hash (cost=1.83..1.83 rows=83 width=4) (实际时间=0.023..0.023 rows=83 loops=1) | | 存储桶:1024 批次:1 内存使用:11kB | | -> api_meter 上的 Seq Scan (cost=0.00..1.83 rows=83 width=4) (actual time=0.002..0.009 rows=83 loops=1) | | 规划时间:0.123 ms | | 执行时间:377.251 毫秒 |
创建索引(并删除多余的索引)
-- We DROP one of the indexes... which will become redundant
-- CREATE INDEX api_meterdata_datetime_idx ON api_meterdata (datetime) ;
DROP INDEX api_meterdata_datetime_idx ;
-- And create one with two columns, and ordered in the same fashion need by the query
CREATE INDEX api_meterdata_datetime_idx
ON api_meterdata (datetime ASC, id DESC) ;
Run Code Online (Sandbox Code Playgroud)
新场景下的查询分析
--
-- We put in place the required index
--
-- This query is again fast, and has an execution plan equivalent in
-- structure to the two first ones. No SORT phase is needed, because rows are
-- already retrieved in the correct order, and once the LIMIT is reached, no
-- more rows are read from (disk/cache)
--
EXPLAIN ANALYZE
SELECT
api_meterdata.id, api_meterdata.meter_id, api_meterdata.datetime,
api_meter.id
FROM
api_meterdata
INNER JOIN api_meter ON ( api_meterdata.meter_id = api_meter.id )
ORDER BY
api_meterdata.datetime ASC, api_meterdata.id DESC
LIMIT 100;
Run Code Online (Sandbox Code Playgroud)
| 查询计划 | | :------------------------------------------------- -------------------------------------------------- -------------------------------------------------- ---------- | | 限制(成本=0.57..21.86行=100宽度=20)(实际时间=0.019..0.229行=100循环=1)| | -> 嵌套循环(成本=0.57..137986.99 行=648001 宽度=20)(实际时间=0.018..0.214 行=100 次循环=1)| | -> 使用 api_meterdata_datetime_idx 对 api_meterdata 进行索引扫描 (cost=0.42..27814.81 rows=648001 width=16) (actual time=0.013..0.040 rows=100 loops=1) | | -> Index Only Scan using api_meter_pkey on api_meter (cost=0.14..0.16 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=100) | | 索引条件:(id = api_meterdata.meter_id) | | 堆获取:100 | | 规划时间:0.218 ms | | 执行时间:0.262 ms |
归档时间: |
|
查看次数: |
1636 次 |
最近记录: |