我的数据结构如下:
create table T
( field_1 char(1) not null
, field_2 int not null
, primary key(field_1, field_2)
);
insert into T (field_1, field_2)
values ('A',1)
,('A',2)
,('B',3)
,('C',3)
,('D',4)
,('E',4)
,('E',5)
,('F',6)
,('G',1);
Run Code Online (Sandbox Code Playgroud)
我想获得具有共同 field_1 或 field_2 的集合,如下所示:
field_1 | field_2 | set
-------------------------------
A | 1 | one
A | 2 | one
G | 1 | one
-------------------------------
B | 3 | two
C | 3 | two
-------------------------------
D | 4 | three
E | 4 | three
E | 5 | three
-------------------------------
F | 6 | four
Run Code Online (Sandbox Code Playgroud)
基本上,在 field_1 或 field_2 中具有共同值的行应该分配给相同的 ID。我觉得这是一个非常基本的操作,但由于某种原因,我无法以正确的方式进行操作。
有什么线索吗?
这是我因多种原因不满意的一次尝试,但它是一个开始。这个想法是计算图的传递闭包,然后使用每个 field_1 的最小值 field_2 作为组。后者使用这样的事实:两个节点要么共享相同的闭包,要么闭包不相交。举个例子:
{(a,1),(a,2),(b,2),(b,3)}
是一个不可能的传递闭包,因为如果 a 和 b 共享 2,它们也必须共享 1 和 3:
我使用了Db2(它不支持CTE中的关键字recursive以及CTE中的ansi join)
CREATE TABLE t (x char(1) not null, y int not null, primary key (x,y));
INSERT INTO t (x,y)
VALUES ('A',1)
,('A',2)
,('B',3)
,('C',3)
,('D',4)
,('E',4)
,('E',5)
,('F',6)
,('G',1);
with tc (field_1,field_2,n) as (
select field_1,field_2,0 from t
union all
select t1.field_1, t3.field_2,n+1
from tc as t1, t as t2, t as t3
where t1.field_2 = t2.field_2
and t1.field_1 < t2.field_1
and t2.field_1 = t3.field_1
and t2.field_2 < t3.field_2
and t1.field_2 < t2.field_2
and n <= (select count(1) from t)
), min_closure(field_1,field_2) as (
-- remove duplicates
select distinct field_1,field_2 from tc
)
select t.field_1, t.field_2
, ( select min(field_2)
from min_closure mc1
where t.field_1 = mc1.field_1 ) as grp
from t
order by grp;
A 1 1
G 1 1
A 2 1
B 3 3
C 3 3
D 4 4
E 4 4
E 5 4
F 6 6
Run Code Online (Sandbox Code Playgroud)
该组还可以进一步调整,但我就这样吧。
主要问题是CTE只能看到之前的迭代,因此我们很可能会向tc添加大量冗余边。对于大图来说,这不会很令人愉快。
编辑:尝试通过仅查看图表中的一个方向来最大程度地减少影响
较长链的示例:
delete from t;
insert into t (x,y) values ('F',5),('F',6),('G',6),('G',7),('H',7),('H',8);
with tc (x,y,n) as (select x,y,0 ...)
select distinct t.x, t.y, min(mc1.y) over (partition by mc1.x) as grp
from min_closure mc1 join t
on t.x = mc1.x order by 3;
F 5 5
F 6 5
G 6 5
G 7 5
H 7 5
H 8 5
Run Code Online (Sandbox Code Playgroud)
编辑:针对 PostgreSQL 方言进行调整:
with recursive tc (field_1,field_2,n) as (
select field_1,field_2,0 from t
union all
select t1.field_1, t3.field_2,n+1
from tc as t1
join t as t2
on t1.field_2 = t2.field_2
and t1.field_1 < t2.field_1
join t as t3
on t2.field_1 = t3.field_1
and t2.field_2 < t3.field_2
where t1.field_2 < t2.field_2
and n <= (select count(1) from t)
), min_closure(field_1,field_2) as (
-- remove duplicates
select distinct field_1,field_2 from tc
)
select t.field_1, t.field_2
, ( select min(field_2)
from min_closure mc1
where t.field_1 = mc1.field_1 ) as grp
from t
order by 1;
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
77 次 |
最近记录: |