Pan*_*rat 12 sql postgresql data-migration
在架构更改后,我必须在Postgres DB中迁移大量现有数据.
在旧模式中,country属性将存储在users表中.现在,country属性已移至单独的地址表中:
users:
country # OLD
address_id # NEW [1:1 relation]
addresses:
id
country
Run Code Online (Sandbox Code Playgroud)
模式实际上更复杂,地址不仅包含国家/地区.因此,每个用户都需要拥有自己的地址(1:1关系).
迁移数据时,我在插入地址后在users表中设置外键时遇到问题:
INSERT INTO addresses (country)
SELECT country FROM users WHERE address_id IS NULL
RETURNING id;
Run Code Online (Sandbox Code Playgroud)
如何传播插入行的ID并在users表中设置外键引用?
到目前为止,我能想出的唯一解决方案是在地址表中创建一个临时的user_id列,然后更新address_id:
UPDATE users SET address_id = a.id FROM addresses AS a
WHERE users.id = a.user_id;
Run Code Online (Sandbox Code Playgroud)
然而,事实证明这非常慢(尽管在users.id和addresses.user_id上都使用了索引).
users表包含大约300万行,其中300k缺少相关地址.
有没有其他方法可以将派生数据插入到一个表中,并在另一个表中设置插入数据的外键引用(不更改架构本身)?
我正在使用Postgres 8.3.14.
谢谢
我现在通过使用Python/sqlalchemy脚本迁移数据来解决问题.事实证明(对我来说)比用SQL尝试更容易.不过,如果有人知道在Postgres SQL中处理INSERT语句的RETURNING结果的方法,我会感兴趣.
Erw*_*ter 16
该表users必须包含一些您未公开的主键.出于这个答案的目的,我将其命名users_id.
使用PostgreSQL 9.1引入的数据修改CTE,您可以相当优雅地解决这个问题:
如果我们可以假设这country是唯一的,那么整个操作就相当简单:
WITH i AS (
INSERT INTO addresses (country)
SELECT country
FROM users
WHERE address_id IS NULL
RETURNING id, country
)
UPDATE users u
SET address_id = i.id
FROM i
WHERE i.country = u.country;
Run Code Online (Sandbox Code Playgroud)
你在提问时提到了8.3版本.如果您在此期间没有进行升级,则可能需要考虑升级.8.3的生命即将结束.
尽管如此,对于8.3版本来说这很简单.你只需要两个陈述:
INSERT INTO addresses (country)
SELECT country
FROM users
WHERE address_id IS NULL;
UPDATE users u
SET address_id = a.id
FROM addresses a
WHERE address_id IS NULL
AND a.country = u.country;
Run Code Online (Sandbox Code Playgroud)
如果country不是唯一的,那就变得更具挑战性.您可以创建一个地址并多次链接到它.但你确实提到了1:1的关系,排除了这种方便的解决方案.
对于9.1版:
WITH s AS (
SELECT users_id, country
, row_number() OVER (PARTITION BY country) AS rn
FROM users
WHERE address_id IS NULL
)
, i AS (
INSERT INTO addresses (country)
SELECT country
FROM s
RETURNING id, country
)
, r AS (
SELECT *
, row_number() OVER (PARTITION BY country) AS rn
FROM i
)
UPDATE users u
SET address_id = r.id
FROM r
JOIN s USING (country, rn) -- select exactly one id for every user
WHERE u.users_id = s.users_id
AND u.address_id IS NULL;
Run Code Online (Sandbox Code Playgroud)
由于没有办法明确地id将从INSERT一个用户返回的确切分配给具有相同的集合中的每个用户country,因此我使用窗口函数row_number()使它们成为唯一的.
不像版本8.3那样直截了当.一种可能的方式:
INSERT INTO addresses (country)
SELECT DISTINCT country -- pick just one per set of dupes
FROM users
WHERE address_id IS NULL;
UPDATE users u
SET address_id = a.id
FROM addresses a
WHERE a.country = u.country
AND u.address_id IS NULL
AND NOT EXISTS (
SELECT * FROM addresses b
WHERE b.country = a.country
AND b.users_id < a.users_id
); -- effectively picking the smallest users_id per set of dupes
Run Code Online (Sandbox Code Playgroud)
重复此操作,直到最后一个NULL值消失users.address_id.
| 归档时间: |
|
| 查看次数: |
8619 次 |
| 最近记录: |