在多个可能的ID中匹配记录

Question

在多个可能的ID中匹配记录

我有多个记录，这些记录的标识符稀疏（我称这些ID号）。每个记录最多可以有两个不同的ID号，并且希望能够一起遍历所有相关记录，以便可以创建一个共享的标识符。我想在T-SQL查询中实现这一点。

本质上，这是一些示例数据：

+-------+-------+--------+-----+------+
| RowId |  ID1  |  ID2   | ID3 | ID4  |
+-------+-------+--------+-----+------+
|     1 | 11111 |        |     |      |
|     2 | 11111 |        |     |      |
|     3 | 11111 | AAAAA  |     |      |
|     4 |       | BBBBBB | BC1 |      |
|     5 |       |        | BC1 | O111 |
|     6 |       | GGGGG  | BC1 |      |
|     7 |       | AAAAA  |     | O111 |
|     8 |       | CCCCCC |     |      |
|     9 | 99999 |        |     |      |
|    10 | 99999 | DDDDDD |     |      |
|    11 |       |        |     | O222 |
|    12 |       | EEEEEE |     | O222 |
|    13 |       | EEEEEE |     | O333 |
+-------+-------+--------+-----+------+

Run Code Online (Sandbox Code Playgroud)

因此，例如，11111链接到RowId3中的AAAAA，AAAA也链接到rowId 7中的O111。O111链接到RowId 5中的BC1。BC1链接到RowId 4中的BBBBBB，等等。此外，我想创建一个一旦所有这些行都链接起来，就创建新的单个标识符。

这是我想为上述所有数据实现的输出：

Denormalised:
+---------+-------+--------+-----+------+
| GroupId |  ID1  |  ID2   | ID3 | ID4  |
+---------+-------+--------+-----+------+
|       1 | 11111 | AAAAA  | BC1 | O111 |
|       1 | 11111 | BBBBBB | BC1 | O111 |
|       1 | 11111 | GGGGG  | BC1 | O111 |
|       2 |       | CCCCCC |     |      |
|       3 | 99999 | DDDDDD |     |      |
|       4 |       | EEEEEE |     | O222 |
|       4 |       | EEEEEE |     | O333 |
+---------+-------+--------+-----+------+


Normalized (probably better to work with): 

+--------+----------+---------+
| IDType | IDNumber | GroupId |
+--------+----------+---------+
| ID1    | 11111    |       1 |
| ID2    | AAAAA    |       1 |
| ID2    | BBBBBB   |       1 |
| ID2    | GGGGG    |       1 |
| ID3    | BC1      |       1 |
| ID4    | O111     |       1 |
| ID2    | CCCCCC   |       2 |
| ID1    | 99999    |       3 |
| ID2    | DDDDDD   |       3 |
| ID2    | EEEEEE   |       4 |
| ID4    | O222     |       4 |
| ID4    | O333     |       4 |
+--------+----------+---------+

Run Code Online (Sandbox Code Playgroud)

我正在寻找SQL代码以生成上面或类似标准化结构的输出。谢谢。

编辑：这是一些代码来创建与上表中的示例数据匹配的数据。

DROP TABLE IF EXISTS #ID
CREATE TABLE #ID
    (
        RowId   INT,
        ID1 VARCHAR(100),
        ID2 VARCHAR(100),
        ID3 VARCHAR(100),
        ID4 VARCHAR(100)
    )

INSERT INTO #ID VALUES 
    (1,'11111',NULL,NULL,NULL),
    (2,'11111',NULL,NULL,NULL),
    (3,'11111','AAAAA',NULL,NULL),
    (4,NULL,'BBBBBB','BC1',NULL),
    (5,NULL,NULL,'BC1','O111'),
    (6,NULL,'GGGGG','BC1',NULL),
    (7,NULL,'AAAAA',NULL,'O111'),
    (8,NULL,'CCCCCC',NULL,NULL),
    (9,'99999',NULL,NULL,NULL),
    (10,'99999','DDDDDD',NULL,NULL),
    (11,NULL,NULL,NULL,'O222'),
    (12,NULL,'EEEEEE',NULL,'O222'),
    (13,NULL,'EEEEEE',NULL,'O333')

Run Code Online (Sandbox Code Playgroud)

Answer 1

The*_*ler 0

我不太明白预期结果的结构，但查询的关键是将节点组装成子图，同时为每个子图提供一个 ID（您称之为GroupId）。

我将结果的最终渲染留给您，因为您可能详细了解为什么要以这种方式显示它。几个LEFT JOINs 就可以了。

无论如何，这是生成子图的查询：

with
p as (
  select
    row_id, row_id as min_id,
    cast(concat(':', row_id, ':') as varchar(1000)) as walked,
    case when id1 is null then ':' else cast(concat(':', id1, ':') as varchar(1000)) end as i1,
    case when id2 is null then ':' else cast(concat(':', id2, ':') as varchar(1000)) end as i2,
    case when id3 is null then ':' else cast(concat(':', id3, ':') as varchar(1000)) end as i3,
    case when id4 is null then ':' else cast(concat(':', id4, ':') as varchar(1000)) end as i4
  from t
  union all
  select
    t.row_id, case when t.row_id < p.min_id then t.row_id else p.min_id end,
    cast(concat(walked, t.row_id, ':') as varchar(1000)),
    case when t.id1 is null then p.i1 else cast(concat(p.i1, id1, ':') as varchar(1000)) end,
    case when t.id2 is null then p.i2 else cast(concat(p.i2, id2, ':') as varchar(1000)) end,
    case when t.id3 is null then p.i3 else cast(concat(p.i3, id3, ':') as varchar(1000)) end,
    case when t.id4 is null then p.i4 else cast(concat(p.i4, id4, ':') as varchar(1000)) end
  from p
  join t on p.i1 like concat('%:', t.id1, ':%')
         or p.i2 like concat('%:', t.id2, ':%')
         or p.i3 like concat('%:', t.id3, ':%')
         or p.i4 like concat('%:', t.id4, ':%')
  where p.walked not like concat('%:', t.row_id, ':%')
),
g as (
  select min_id as min_id, min(walked) as nodes
  from p
  where not exists (
    select 1
    from t 
    where (p.i1 like concat('%:', t.id1, ':%')
        or p.i2 like concat('%:', t.id2, ':%')
        or p.i3 like concat('%:', t.id3, ':%')
        or p.i4 like concat('%:', t.id4, ':%'))
       and p.walked not like concat('%:', t.row_id, ':%')
  )
  group by min_id
)
select row_number() over(order by min_id) as group_id, nodes from g

Run Code Online (Sandbox Code Playgroud)

结果：

group_id  nodes          
--------  ---------------
1         :1:2:3:7:5:4:6:                                     
2         :8:            
3         :10:9:         
4         :11:12:13:

Run Code Online (Sandbox Code Playgroud)

作为参考，这是我用来测试的数据脚本：

create table t (
  row_id int,
  id1 int,
  id2 varchar(10),
  id3 varchar(10),
  id4 varchar(10)
);

insert into t (row_id, id1, id2, id3, id4) values 
  (1,  '11111', null,     null,  null),
  (2,  '11111', null,     null,  null),
  (3,  '11111', 'AAAAA',  null,  null),
  (4,  null,    'BBBBB',  'BC1', null),
  (5,  null,    null,     'BC1', '0111'),
  (6,  null,    'GGGGG',  'BC1', null),
  (7,  null,    'AAAAA',  null,  '0111'),
  (8,  null,    'CCCCCC', null,  null),
  (9,  '99999', null,     null,  null),
  (10, '99999', 'DDDDD',  null,  null),
  (11, null,    null,     null,  '0222'),
  (12, null,    'EEEEE',  null,  '0222'),
  (13, null,    'EEEEE',  null,  '0333');

Run Code Online (Sandbox Code Playgroud)

注意：我可以想象这个查询的性能相当慢。PostgreSQL 中的解决方案性能会很高，因为与 SQL Server 不同，它是UNION在递归 CTE 中实现的。UNION ALL与（SQL Server 中的唯一选择）相比，这可以在图形遍历中更早地删除整个树枝。

归档时间：	6 年，5 月前
查看次数：	213 次
最近记录：	6 年，5 月前