amo*_*sai 3 sql arrays postgresql duplicates sql-update
我有一个 PostgreSQL 表,其中有一列包含字符串数组。该行有一些唯一的数组字符串,或者一些也有重复的字符串。如果存在,我想从每一行中删除重复的字符串。
我尝试了一些查询,但无法实现。
以下是表格:
veh_id | vehicle_types
--------+----------------------------------------
1 | {"byd_tang","volt","viper","laferrari"}
2 | {"volt","viper"}
3 | {"byd_tang","sonata","jaguarxf"}
4 | {"swift","teslax","mirai"}
5 | {"volt","viper"}
6 | {"viper","ferrariff","bmwi8","viper"}
7 | {"ferrariff","viper","viper","volt"}
Run Code Online (Sandbox Code Playgroud)
我期待以下输出:
veh_id | vehicle_types
--------+----------------------------------------
1 | {"byd_tang","volt","viper","laferrari"}
2 | {"volt","viper"}
3 | {"byd_tang","sonata","jaguarxf"}
4 | {"swift","teslax","mirai"}
5 | {"volt","viper"}
6 | {"viper","ferrariff","bmwi8"}
7 | {"ferrariff","viper","volt"}
Run Code Online (Sandbox Code Playgroud)
由于每一行的数组都是独立的,一个带有 ARRAY 构造函数的普通相关子查询就可以完成这项工作:
SELECT *, ARRAY(SELECT DISTINCT unnest (vehicle_types)) AS vehicle_types_uni
FROM vehicle;
Run Code Online (Sandbox Code Playgroud)
看:
注意NULL转换为空数组 ( '{}')。我们需要做特殊情况,但UPDATE无论如何它都被排除在下面。
快速而简单。但是不要用这个。您没有这么说,但通常您希望保留数组元素的原始顺序。您的基本样本也表明了这一点。使用WITH ORDINALITY在相关子查询,这变得更加复杂一些:
SELECT *, ARRAY (SELECT v
FROM unnest(vehicle_types) WITH ORDINALITY t(v,ord)
GROUP BY 1
ORDER BY min(ord)
) AS vehicle_types_uni
FROM vehicle;
Run Code Online (Sandbox Code Playgroud)
看:
UPDATE 实际删除欺骗:
UPDATE vehicle
SET vehicle_types = ARRAY (
SELECT v
FROM unnest(vehicle_types) WITH ORDINALITY t(v,ord)
GROUP BY 1
ORDER BY min(ord)
)
WHERE cardinality(vehicle_types) > 1 -- optional
AND vehicle_types <> ARRAY (
SELECT v
FROM unnest(vehicle_types) WITH ORDINALITY t(v,ord)
GROUP BY 1
ORDER BY min(ord)
); -- suppress empty updates (optional)
Run Code Online (Sandbox Code Playgroud)
这两个附加WHERE条件都是可选的,以提高性能。第一个是完全多余的。每个条件也排除这种NULL情况。第二个抑制所有空更新。
看:
如果您尝试在不保留原始顺序的情况下执行此操作,则可能会在不需要的情况下更新大多数行,因为即使没有欺骗,顺序或元素也发生了变化。
需要 Postgres 9.4 或更高版本。
db<>在这里摆弄