var*_*ran 1 postgresql performance upsert
有两张桌子
tmp_stat:
date, site_id, ip, block_id, count
Primary Key (date, site_id, ip, block_id)
main_stat:
date, site_id, ip, block_id, count
Primary Key (date, site_id, ip, block_id)
Run Code Online (Sandbox Code Playgroud)
当没有这样的(date,site_id等)时,我需要从tmp_stat向main_stat插入行,并且当它们已经尽快存在时,需要更新计数
tmp_stat包含大约500000行,main_stat包含millons
以下工作如何?
WITH upd AS (
UPDATE main_stat t
SET counter = s.counter
FROM tmp_stat s
WHERE t.date = s.date
AND t.site_id = s.site_id
AND t.ip = s.ip
AND t.block_id = s.block_id
RETURNING s.date, s.site_id, s.ip, s.block_id, s.counter
)
INSERT INTO main_stat
SELECT s.mydate, s.site_id, s.ip, s.block_id, s.counter
FROM tmp_stat s
LEFT JOIN upd ON (upd.date = s.date and upd.site_id = s.site_id and upd.ip = s.ip and upd.block_id = s.block_id)
WHERE upd.date IS NULL
;
Run Code Online (Sandbox Code Playgroud)
更新:
看起来这只适用于9.1或更高版本.
使用just-somebody的建议WHERE (t.date, t.site_id, t.ip, t.block_id) = (s.date, s.site_id, s.ip, s.block_id)似乎可以提供更好的性能.
WITH upd AS (
UPDATE main_stat t
SET counter = s.counter
FROM tmp_stat s
WHERE ( t.date, t.site_id, t.ip, t.block_id ) = ( s.date, s.site_id, s.ip, s.block_id )
RETURNING s.date, s.site_id, s.ip, s.block_id
)
INSERT INTO main_stat
SELECT s.date, s.site_id, s.ip, s.block_id, s.counter
FROM tmp_stat s
LEFT JOIN upd
ON ( upd.date = s.date
AND upd.site_id = s.site_id
AND upd.ip = s.ip
AND upd.block_id = s.block_id )
WHERE upd.date IS NULL
;
Run Code Online (Sandbox Code Playgroud)
这里发生的是我们使用CTE进行UPDATE,CTE返回更新行的标识列.
然后,INSERT使用更新的行信息来过滤tmp_stat以仅插入新记录.
Dimitri Fontaine在本博客文章中介绍了一些并发性警告.
有关CTE的更多信息可以在Postgresql 文档中找到.
| 归档时间: |
|
| 查看次数: |
2531 次 |
| 最近记录: |