如何批量更新 postgres 中的单个列 5500 万条记录

cod*_*eek 9 sql postgresql plpgsql

我想更新 postgres 表的一列。记录大约有 5500 万条,因此我们需要批量更新 10000 条记录。注意:我们要更新所有行。但我们不想锁定我们的桌子。

我正在尝试以下查询 -

Update account set name = Some name where id between 1 and 10000
Run Code Online (Sandbox Code Playgroud)

我们如何为每 10000 条记录更新创建一个循环?

任何建议和帮助将不胜感激。

PostgreSQL 10.5

Jim*_*nes 24

我宁愿尝试将更新行分成小批量,例如您建议的 10k 记录,而不是一次提交所有更改(或按照其他答案中建议的 5500 万次)。在 PL/pgSQL 中,可以使用关键字以给定步骤迭代集合BY。所以你可以像这样进行批量更新anonymous code block

PostgreSQL 11+

DO $$ 
DECLARE 
  page int := 10000;
  min_id bigint; max_id bigint;
BEGIN
  SELECT max(id),min(id) INTO max_id,min_id FROM account;
  FOR j IN min_id..max_id BY page LOOP 
    UPDATE account SET name = 'your magic goes here'
    WHERE id >= j AND id < j+page;
    COMMIT;            
  END LOOP;
END; $$;
Run Code Online (Sandbox Code Playgroud)
  • 您可能需要调整该WHERE条款以避免不必要的重叠。

测试

具有 1051 行且具有连续 ID 的数据样本:

CREATE TABLE account (id int, name text);
INSERT INTO account VALUES(generate_series(0,1050),'untouched record..');
Run Code Online (Sandbox Code Playgroud)

执行匿名代码块...

DO $$ 
DECLARE 
  page int := 100;
  min_id bigint; max_id bigint;
BEGIN
  SELECT max(id),min(id) INTO max_id,min_id FROM account;
  FOR j IN min_id..max_id BY page LOOP 
    UPDATE account SET name = now() ||' -> UPDATED ' || j  || ' to ' || j+page
    WHERE id >= j AND id < j+page;
    RAISE INFO 'committing data from % to % at %', j,j+page,now();
    COMMIT;            
  END LOOP;
END; $$;
    
INFO:  committing data from 0 to 100 at 2021-04-14 17:35:42.059025+02
INFO:  committing data from 100 to 200 at 2021-04-14 17:35:42.070274+02
INFO:  committing data from 200 to 300 at 2021-04-14 17:35:42.07806+02
INFO:  committing data from 300 to 400 at 2021-04-14 17:35:42.087201+02
INFO:  committing data from 400 to 500 at 2021-04-14 17:35:42.096548+02
INFO:  committing data from 500 to 600 at 2021-04-14 17:35:42.105876+02
INFO:  committing data from 600 to 700 at 2021-04-14 17:35:42.114514+02
INFO:  committing data from 700 to 800 at 2021-04-14 17:35:42.121946+02
INFO:  committing data from 800 to 900 at 2021-04-14 17:35:42.12897+02
INFO:  committing data from 900 to 1000 at 2021-04-14 17:35:42.134388+02
INFO:  committing data from 1000 to 1100 at 2021-04-14 17:35:42.13951+02
Run Code Online (Sandbox Code Playgroud)

..您可以批量更新行。为了使我的观点更清楚,以下查询计算按更新时间分组的记录数:

SELECT DISTINCT ON (name) name, count(id)
FROM account 
GROUP BY name ORDER BY name;

                         name                         | count 
------------------------------------------------------+-------
 2021-04-14 17:35:42.059025+02 -> UPDATED 0 to 100    |   100
 2021-04-14 17:35:42.070274+02 -> UPDATED 100 to 200  |   100
 2021-04-14 17:35:42.07806+02 -> UPDATED 200 to 300   |   100
 2021-04-14 17:35:42.087201+02 -> UPDATED 300 to 400  |   100
 2021-04-14 17:35:42.096548+02 -> UPDATED 400 to 500  |   100
 2021-04-14 17:35:42.105876+02 -> UPDATED 500 to 600  |   100
 2021-04-14 17:35:42.114514+02 -> UPDATED 600 to 700  |   100
 2021-04-14 17:35:42.121946+02 -> UPDATED 700 to 800  |   100
 2021-04-14 17:35:42.12897+02 -> UPDATED 800 to 900   |   100
 2021-04-14 17:35:42.134388+02 -> UPDATED 900 to 1000 |   100
 2021-04-14 17:35:42.13951+02 -> UPDATED 1000 to 1100 |    51
Run Code Online (Sandbox Code Playgroud)

演示:db<>fiddle


Fra*_*ens 2

您可以使用一个过程(从版本 11 开始提供)并逐一执行,如下所示:

CREATE or replace PROCEDURE do_update()
LANGUAGE plpgsql
AS $$
BEGIN
    FOR i IN 1..55000000 -- 55 million, or whatever number you need
    LOOP 

        Update account set name = Some name where id = i;
        COMMIT;
        
        RAISE INFO 'id: %', i;
    END LOOP;
END;
$$;

CALL do_update();
Run Code Online (Sandbox Code Playgroud)