Ale*_*nko 4 sql postgresql performance
我有一个复杂的查询:
SELECT DISTINCT ON (delivery.id)
delivery.id, dl_processing.pid
FROM mailer.mailer_message_recipient_rel AS delivery
JOIN mailer.mailer_message AS message ON delivery.message_id = message.id
JOIN mailer.mailer_message_recipient_rel_log AS dl_processing ON dl_processing.rel_id = delivery.id AND dl_processing.status = 1000
-- LEFT JOIN mailer.mailer_recipient AS r ON delivery.email = r.email
JOIN mailer.mailer_mailing AS mailing ON message.mailing_id = mailing.id
WHERE
NOT EXISTS (SELECT dl_finished.id FROM mailer.mailer_message_recipient_rel_log AS dl_finished WHERE dl_finished.rel_id = delivery.id AND dl_finished.status <> 1000) AND
dl_processing.date <= NOW() - (36000 * INTERVAL '1 second') AND
NOT EXISTS (SELECT ml.id FROM mailer.mailer_message_log AS ml WHERE ml.message_id = message.id) AND
-- (r.times_bounced < 5 OR r.times_bounced IS NULL) AND
NOT EXISTS (SELECT ur.id FROM mailer.mailer_unsubscribed_recipient AS ur WHERE ur.email = delivery.email AND ur.list_id = mailing.list_id)
ORDER BY delivery.id, dl_processing.id DESC
LIMIT 1000;
Run Code Online (Sandbox Code Playgroud)
它运行非常缓慢,原因似乎是尽管我拥有了为此所需的所有索引,但Postgres始终避免在其查询计划中使用合并联接。看起来真令人沮丧:
http://explain.depesz.com/s/tVY

http://i.stack.imgur.com/Myw4R.png
为什么会这样?如何解决此类问题?
UPD:在@wildplasser的帮助下,我对查询进行了重新设计以修复性能(同时对其语义进行了一些更改):
SELECT delivery.id, dl_processing.pid
FROM mailer.mailer_message_recipient_rel AS delivery
JOIN mailer.mailer_message AS message ON delivery.message_id = message.id
JOIN mailer.mailer_message_recipient_rel_log AS dl_processing ON dl_processing.rel_id = delivery.id AND dl_processing.status in (1000, 2, 5) AND dl_processing.date <= NOW() - (36000 * INTERVAL '1 second')
LEFT JOIN mailer.mailer_recipient AS r ON delivery.email = r.email
WHERE
(r.times_bounced < 5 OR r.times_bounced IS NULL) AND
NOT EXISTS (SELECT dl_other.id FROM mailer.mailer_message_recipient_rel_log AS dl_other WHERE dl_other.rel_id = delivery.id AND dl_other.id > dl_processing.id) AND
NOT EXISTS (SELECT ml.id FROM mailer.mailer_message_log AS ml WHERE ml.message_id = message.id) AND
NOT EXISTS (SELECT ur.id FROM mailer.mailer_unsubscribed_recipient AS ur JOIN mailer.mailer_mailing AS mailing ON message.mailing_id = mailing.id WHERE ur.email = delivery.email AND ur.list_id = mailing.list_id)
ORDER BY delivery.id
LIMIT 1000
Run Code Online (Sandbox Code Playgroud)
现在,它运行良好,但是查询计划仍然采用这些可怕的嵌套循环联接<_ <:
http://explain.depesz.com/s/MTo3
我仍然想知道为什么。
原因是Postgres实际上在做正确的事,而我在数学上很烂。假设表A有N行,表B有M行,并且它们通过一列进行连接,而这两列都具有B树索引。那么以下是正确的:
ORDER子句而需要行的特定顺序时才需要这样做,所以我们将看到这根本不是一件坏事。因此,基本上,尽管我们都喜欢与合并排序相关联,但合并联接几乎总是很糟糕。
我的第一个查询之所以如此缓慢是因为它必须在应用限制之前执行排序,并且在许多其他方面也很糟糕。在应用@wildplasser的建议之后,我设法减少了(仍然很昂贵的)嵌套循环的数量,并允许不加限制地进行限制,从而确保Postgres最有可能不需要对外部扫描进行补充,这是我从中获得大部分性能提升的地方。
| 归档时间: |
|
| 查看次数: |
4227 次 |
| 最近记录: |