Postgres 正在为具有简单连接的多个查询执行次优连接。它似乎在进行笛卡尔连接,然后使用连接过滤器删除行,下面的解释计划显示:
Insert on sierra (cost=7.350..26.120 rows=1 width=88) (actual rows=0 loops=1)
-> Nested Loop (cost=7.350..26.080 rows=1 width=426) (actual rows=14356 loops=1)
-> Nested Loop (cost=6.240..19.470 rows=1 width=304) (actual rows=14356 loops=1)
Join Filter: (xray_two.echo_romeo = november_three.echo_romeo)
-> Nested Loop Left Join (cost=6.240..18.380 rows=1 width=184) (actual rows=14356 loops=1)
-> Nested Loop (cost=3.970..11.980 rows=1 width=187) (actual rows=14356 loops=1)
Join Filter: ((lima.victor = xray_two.victor) AND (lima.three = xray_two.three))
Rows Removed by Join Filter: 3813025380
-> Index Scan using foxtrot on echo_two lima (cost=1.720..5.590 rows=1 width=185) (actual rows=14356 loops=1)
Index Cond: ((delta = 'uniform'::timestamp without time zone) AND (zulu_tango = 1))
-> Index Scan using november_hotel on mike xray_two (cost=2.250..6.330 rows=1 width=18) (actual rows=265606 loops=14356)
Index Cond: ((delta = 'uniform'::timestamp without time zone) AND (zulu_tango = 1))
-> Index Scan using zulu_six on india whiskey (cost=2.270..6.360 rows=1 width=17) (actual rows=1 loops=14356)
Index Cond: ((lima.delta = delta) AND (delta = 'uniform'::timestamp without time zone) AND (lima.victor = victor))
-> Seq Scan on kilo november_three (cost=0.000..1.040 rows=1 width=124) (actual rows=1 loops=14356)
-> Limit (cost=1.110..1.130 rows=1 width=130) (actual rows=1 loops=14356)
-> Sort (cost=1.110..1.130 rows=1 width=130) (actual rows=1 loops=14356)
Sort Key: xray_delta.six DESC
Sort Method: quicksort Memory: 25kB
-> Seq Scan on xray_delta (cost=0.000..1.070 rows=1 width=130) (actual rows=1 loops=1)
Filter: (six < ('juliet'::cstring)::timestamp without time zone)
Run Code Online (Sandbox Code Playgroud)
您可以看到连接过滤器删除了大约 4b 行。Postgres 只需要 1 行,但 november_hotel 返回的实际行数是 265k。然后它循环 265k 行 14k 次。是什么导致规划器执行如此低效的连接/过滤方案?需要注意的几点:
是什么导致 postgres 的计划如此错误?我可以通过在执行有问题的查询之前手动分析来解决问题,但我认为这并没有解决潜在的问题,然后可能会在我的代码中的任何地方出现。
编辑:
根据要求,我已将查询简化到最基本的级别,从两个表中选择 *,加入所有索引字段。
方法:我插入了新的几个小时的数据,然后立即运行了解释分析。当 4000 秒后完成时,我再次运行相同的解释分析(此时 postgres 已自动清空表),它在半秒内返回。唯一的区别是在 table_b 的嵌套循环中返回的实际行。
查询:
explain analyze
select col.*, strat.*
FROM table_a col
JOIN table_b strat
ON (strat.cellkey = col.cellkey
AND strat.offerkey = col.offerkey
AND strat.strategykey = col.strategykey
AND strat.startdate = col.startdate)
where col.startdate = '2017-05-17 1700'
AND col.strategykey = 1;
Run Code Online (Sandbox Code Playgroud)
第一个解释:
Nested Loop (cost=4.51..13.48 rows=1 width=544) (actual time=6.210..4264064.949 rows=31169 loops=1)
Join Filter: ((col.cellkey = strat.cellkey) AND (col.offerkey = strat.offerkey))
Rows Removed by Join Filter: 8278642245
-> Index Scan using table_a_1 on table_a col (cost=2.24..6.76 rows=1 width=494) (actual time=0.034..177.203 rows=31169 loops=1)
Index Cond: ((startdate = '2017-05-17 17:00:00'::timestamp without time zone) AND (strategykey = 1))
-> Index Scan using table_b_1 on table_b strat (cost=2.27..6.66 rows=1 width=50) (actual time=0.020..94.664 rows=265606 loops=31169)
Index Cond: ((startdate = '2017-05-17 17:00:00'::timestamp without time zone) AND (strategykey = 1))
Planning time: 4.689 ms
Execution time: 4264074.251 ms
Run Code Online (Sandbox Code Playgroud)
第二个解释:
Nested Loop (cost=4.51..341069.90 rows=36588 width=545) (actual time=0.290..538.989 rows=31169 loops=1)
-> Index Scan using table_a_1 on table_a col (cost=2.24..73371.98 rows=36662 width=495) (actual time=0.168..81.488 rows=31169 loops=1)
Index Cond: ((startdate = '2017-05-17 17:00:00'::timestamp without time zone) AND (strategykey = 1))
-> Index Scan using table_b_1 on table_b strat (cost=2.27..7.26 rows=1 width=50) (actual time=0.012..0.013 rows=1 loops=31169)
Index Cond: ((startdate = '2017-05-17 17:00:00'::timestamp without time zone) AND (cellkey = col.cellkey) AND (strategykey = 1) AND (offerkey = col.offerkey))
Planning time: 10.053 ms
Execution time: 543.467 ms
Run Code Online (Sandbox Code Playgroud)
我们如何解释这一点?表中会定期插入数据,即使使用超激进的清理,我们也不能保证我们运行的每个查询都会对它们进行新的分析。
归档时间: |
|
查看次数: |
2787 次 |
最近记录: |