对字段 A 或字段 B 上的行进行分组

gab*_*lee 5 postgresql

我的数据结构如下:

create table T 
( field_1 char(1) not null
, field_2 int not null
,    primary key(field_1, field_2)
);

insert into T (field_1, field_2)
values ('A',1)
      ,('A',2)
      ,('B',3)
      ,('C',3)
      ,('D',4)   
      ,('E',4)
      ,('E',5)
      ,('F',6)
      ,('G',1);
Run Code Online (Sandbox Code Playgroud)

我想获得具有共同 field_1 或 field_2 的集合,如下所示:

field_1     | field_2   | set
-------------------------------
A           | 1         | one
A           | 2         | one
G           | 1         | one
-------------------------------
B           | 3         | two
C           | 3         | two
-------------------------------
D           | 4         | three
E           | 4         | three
E           | 5         | three
-------------------------------
F           | 6         | four
Run Code Online (Sandbox Code Playgroud)

基本上,在 field_1 或 field_2 中具有共同值的行应该分配给相同的 ID。我觉得这是一个非常基本的操作,但由于某种原因,我无法以正确的方式进行操作。

有什么线索吗?

Len*_*art 1

这是我因多种原因不满意的一次尝试,但它是一个开始。这个想法是计算图的传递闭包,然后使用每个 field_1 的最小值 field_2 作为组。后者使用这样的事实:两个节点要么共享相同的闭包,要么闭包不相交。举个例子:

{(a,1),(a,2),(b,2),(b,3)}

是一个不可能的传递闭包,因为如果 a 和 b 共享 2,它们也必须共享 1 和 3:

我使用了Db2(它不支持CTE中的关键字recursive以及CTE中的ansi join)

CREATE TABLE t (x char(1) not null, y int not null, primary key (x,y));
INSERT INTO t (x,y)
VALUES ('A',1) 
      ,('A',2)
      ,('B',3)
      ,('C',3)
      ,('D',4)   
      ,('E',4)
      ,('E',5)
      ,('F',6)
      ,('G',1);

with tc (field_1,field_2,n) as (
    select field_1,field_2,0 from t
    union all
    select t1.field_1, t3.field_2,n+1
    from tc as t1, t as t2, t as t3
    where t1.field_2 = t2.field_2
      and t1.field_1 < t2.field_1
      and t2.field_1 = t3.field_1
      and t2.field_2 < t3.field_2
      and t1.field_2 < t2.field_2
      and n <= (select count(1) from t)
), min_closure(field_1,field_2) as (
    -- remove duplicates
    select distinct field_1,field_2 from tc
)
select t.field_1, t.field_2
     , ( select min(field_2)
         from min_closure mc1
         where t.field_1 = mc1.field_1 ) as grp
from t
order by grp;

A           1           1
G           1           1
A           2           1
B           3           3
C           3           3
D           4           4
E           4           4
E           5           4
F           6           6
Run Code Online (Sandbox Code Playgroud)

该组还可以进一步调整,但我就这样吧。

主要问题是CTE只能看到之前的迭代,因此我们很可能会向tc添加大量冗余边。对于大图来说,这不会很令人愉快。

编辑:尝试通过仅查看图表中的一个方向来最大程度地减少影响

较长链的示例:

delete from t;
insert into t (x,y) values ('F',5),('F',6),('G',6),('G',7),('H',7),('H',8);
with tc (x,y,n) as (select x,y,0 ...)
select distinct t.x, t.y, min(mc1.y) over (partition by mc1.x) as grp 
from min_closure mc1 join t 
    on t.x = mc1.x order by 3;

F           5           5
F           6           5
G           6           5
G           7           5
H           7           5
H           8           5
Run Code Online (Sandbox Code Playgroud)

编辑:针对 PostgreSQL 方言进行调整:

with recursive tc (field_1,field_2,n) as (
    select field_1,field_2,0 from t
    union all 
    select t1.field_1, t3.field_2,n+1
    from tc as t1
    join t as t2
        on t1.field_2 = t2.field_2
     and t1.field_1 < t2.field_1 
    join t as t3
        on t2.field_1 = t3.field_1
       and t2.field_2 < t3.field_2 
    where t1.field_2 < t2.field_2 
      and n <= (select count(1) from t)
), min_closure(field_1,field_2) as (
    -- remove duplicates
    select distinct field_1,field_2 from tc
) 
select t.field_1, t.field_2
     , ( select min(field_2) 
         from min_closure mc1
         where t.field_1 = mc1.field_1 ) as grp
from t 
order by 1;
Run Code Online (Sandbox Code Playgroud)