如何将连接的子图 ID 分配给 Oracle SQL 中无向图中的每个节点?

Ste*_*pan 3 sql graph oracle11g

如何在from_node和定义的 link_tbl 中对一组链接进行分区to_node。有 10M 个节点和 25M 个链接(每个节点不超过 20 个链接)。

例如
在此处输入图片说明

该图由三个不相交的子图组成。

  create table link_tbl as (
  select 'AY' as linkid, 'A' as from_node, 'Y' as to_node from dual union all
  select 'AB', 'A', 'B' from dual union all
  select 'CA', 'C', 'A' from dual union all      
  select 'GE', 'G', 'E' from dual union all
  select 'CB', 'C', 'B' from dual union all
  select 'EF', 'E', 'F' from dual union all
  select 'NM', 'N', 'M' from dual
  );
  --compute subnetid
  select * from link_tbl order by subnetid;
Run Code Online (Sandbox Code Playgroud)

要获得带有subnetid值的结果集?

在此处输入图片说明

我可以使用 Java 中洪水填充的变体将子图 id 分配给图的每个节点。但是这可以在 SQL 中完成吗?

伪代码:
- 用连续整数排列节点。- 将链接表示为(node1*100M + node2) as number(19,0) - 联合链接 node1、node2 与 node2、node1 和排序 - 获取第一个节点作为锚点node并添加以subgraph_nodes' - Iterate over link table -添加 node2to subgraph_nodes if node1 is in (subgraph_nodes) AND node2 not in subgraph_nodes

这应该将所有连接的节点添加到 subgraph_nodes

这就足够了,因为现在我可以将 subgraph_id' 添加到节点表并选择所有没有subgraph id并重复子图分析的节点。

有 10M 个节点表示为连续整数和 25M 个链接(每个节点不超过 20 个链接),表示为(from_node*100M + to_node) as ID, from_node, to_node

Vla*_*nov 5

前段时间我写了一个关于如何找到无向图的所有连通子图的答案。它是为 SQL Server 编写的,但 Oracle 支持标准递归查询,因此很容易将其转换为 Oracle。使用特定于 Oracle 的构造可以更有效地编写它。

样本数据

create table link_tbl as (
  select 'AY' as linkid, 'A' as from_node, 'Y' as to_node from dual union all
  select 'AB', 'A', 'B' from dual union all
  select 'CA', 'C', 'A' from dual union all      
  select 'GE', 'G', 'E' from dual union all
  select 'CB', 'C', 'B' from dual union all
  select 'EF', 'E', 'F' from dual union all
  select 'NM', 'N', 'M' from dual
);
Run Code Online (Sandbox Code Playgroud)

询问

WITH
CTE_Nodes
AS
(
    SELECT from_node AS Node
    FROM link_tbl

    UNION

    SELECT to_node AS Node
    FROM link_tbl
)
,CTE_Pairs
AS
(
    SELECT from_node AS Node1, to_node AS Node2
    FROM link_tbl
    WHERE from_node <> to_node

    UNION

    SELECT to_node AS Node1, from_node AS Node2
    FROM link_tbl
    WHERE from_node <> to_node
)
,CTE_Recursive (AnchorNode, Node1, Node2, NodePath, Lvl)
AS
(
    SELECT
        CAST(CTE_Nodes.Node AS varchar(2000)) AS AnchorNode
        , Node1
        , Node2
        , CAST(',' || Node1 || ',' || Node2 || ',' AS varchar(2000)) AS NodePath
        , 1 AS Lvl
    FROM 
        CTE_Pairs
        INNER JOIN CTE_Nodes ON CTE_Nodes.Node = CTE_Pairs.Node1

    UNION ALL

    SELECT 
        CTE_Recursive.AnchorNode
        , CTE_Pairs.Node1
        , CTE_Pairs.Node2
        , CAST(CTE_Recursive.NodePath || CTE_Pairs.Node2 || ',' AS varchar(2000)) AS NodePath
        , CTE_Recursive.Lvl + 1 AS Lvl
    FROM
        CTE_Pairs
        INNER JOIN CTE_Recursive ON CTE_Recursive.Node2 = CTE_Pairs.Node1
    WHERE
        CTE_Recursive.NodePath NOT LIKE CAST('%,' || CTE_Pairs.Node2 || ',%' AS varchar(2000))
)
,CTE_RecursionResult
AS
(
    SELECT AnchorNode, Node1, Node2
    FROM CTE_Recursive
)
,CTE_CleanResult
AS
(
    SELECT AnchorNode, Node1 AS Node
    FROM CTE_RecursionResult

    UNION

    SELECT AnchorNode, Node2 AS Node
    FROM CTE_RecursionResult
)
SELECT
    CTE_Nodes.Node
    ,LISTAGG(CTE_CleanResult.Node, ',') WITHIN GROUP (ORDER BY CTE_CleanResult.Node) AS GroupMembers
    ,DENSE_RANK() OVER (ORDER BY LISTAGG(CTE_CleanResult.Node, ',') WITHIN GROUP (ORDER BY CTE_CleanResult.Node)) AS GroupID
FROM
    CTE_Nodes
    INNER JOIN CTE_CleanResult ON CTE_CleanResult.AnchorNode = CTE_Nodes.Node
GROUP BY
    CTE_Nodes.Node
ORDER BY
    GroupID
    ,CTE_Nodes.Node
;
Run Code Online (Sandbox Code Playgroud)

结果

+------+--------------+---------+
| NODE | GROUPMEMBERS | GROUPID |
+------+--------------+---------+
| A    | A,B,C,Y      |       1 |
| B    | A,B,C,Y      |       1 |
| C    | A,B,C,Y      |       1 |
| Y    | A,B,C,Y      |       1 |
| E    | E,F,G        |       2 |
| F    | E,F,G        |       2 |
| G    | E,F,G        |       2 |
| M    | M,N          |       3 |
| N    | M,N          |       3 |
+------+--------------+---------+
Run Code Online (Sandbox Code Playgroud)

https://dbfiddle.uk/?rdbms=oracle_11.2&fiddle=e61cf73824e7718a4686430ccd7398e7

这个怎么运作

CTE_Nodes

CTE_Nodes给出出现在from_nodeto_node列中的所有节点的列表。由于它们可以以任何顺序出现,因此我们将UNION它们列在一起。UNION还会删除任何重复项。

+------+
| NODE |
+------+
| A    |
| B    |
| C    |
| E    |
| F    |
| G    |
| M    |
| N    |
| Y    |
+------+
Run Code Online (Sandbox Code Playgroud)

CTE_Pairs

CTE_Pairs给出图在两个方向上的所有边的列表。同样,UNION用于删除任何重复项。

+-------+-------+
| NODE1 | NODE2 |
+-------+-------+
| A     | B     |
| A     | C     |
| A     | Y     |
| B     | A     |
| B     | C     |
| C     | A     |
| C     | B     |
| E     | F     |
| E     | G     |
| F     | E     |
| G     | E     |
| M     | N     |
| N     | M     |
| Y     | A     |
+-------+-------+
Run Code Online (Sandbox Code Playgroud)

CTE_Recursive

CTE_Recursive是查询的主要部分,它从每个唯一的节点开始递归遍历图。这些起始行由 的第一部分产生UNION ALLUNION ALL递归地连接到自身的第二部分链接Node2Node1. 由于我们预先制作CTE_Pairs了所有边都写在两个方向上,所以我们总是可以只链接Node2Node1图中的所有路径。同时,查询构建NodePath- 到目前为止已遍历的以逗号分隔的节点字符串。它用于WHERE过滤器:

CTE_Recursive.NodePath NOT LIKE CAST('%,' || CTE_Pairs.Node2 || ',%' AS varchar(2000))
Run Code Online (Sandbox Code Playgroud)

一旦我们遇到之前包含在路径中的节点,递归就会停止,因为连接的节点列表已用完。 AnchorNode是递归的起始节点,稍后将用于对结果进行分组。 Lvl并没有真正使用,我把它包括在内是为了更好地理解正在发生的事情。

+------------+-------+-------+-----------+-----+
| ANCHORNODE | NODE1 | NODE2 | NODEPATH  | LVL |
+------------+-------+-------+-----------+-----+
| A          | A     | Y     | ,A,Y,     |   1 |
| A          | A     | C     | ,A,C,     |   1 |
| A          | A     | B     | ,A,B,     |   1 |
| B          | B     | C     | ,B,C,     |   1 |
| B          | B     | A     | ,B,A,     |   1 |
| C          | C     | B     | ,C,B,     |   1 |
| C          | C     | A     | ,C,A,     |   1 |
| E          | E     | G     | ,E,G,     |   1 |
| E          | E     | F     | ,E,F,     |   1 |
| F          | F     | E     | ,F,E,     |   1 |
| G          | G     | E     | ,G,E,     |   1 |
| M          | M     | N     | ,M,N,     |   1 |
| N          | N     | M     | ,N,M,     |   1 |
| Y          | Y     | A     | ,Y,A,     |   1 |
| Y          | A     | B     | ,Y,A,B,   |   2 |
| C          | A     | B     | ,C,A,B,   |   2 |
| Y          | A     | C     | ,Y,A,C,   |   2 |
| B          | A     | C     | ,B,A,C,   |   2 |
| C          | A     | Y     | ,C,A,Y,   |   2 |
| B          | A     | Y     | ,B,A,Y,   |   2 |
| C          | B     | A     | ,C,B,A,   |   2 |
| A          | B     | C     | ,A,B,C,   |   2 |
| B          | C     | A     | ,B,C,A,   |   2 |
| A          | C     | B     | ,A,C,B,   |   2 |
| G          | E     | F     | ,G,E,F,   |   2 |
| F          | E     | G     | ,F,E,G,   |   2 |
| B          | A     | Y     | ,B,C,A,Y, |   3 |
| C          | A     | Y     | ,C,B,A,Y, |   3 |
| Y          | B     | C     | ,Y,A,B,C, |   3 |
| Y          | C     | B     | ,Y,A,C,B, |   3 |
+------------+-------+-------+-----------+-----+
Run Code Online (Sandbox Code Playgroud)

CTE_CleanResult

CTE_CleanResult只留下相关部分 fromCTE_Recursive并再次合并Node1Node2using UNION

+------------+------+
| ANCHORNODE | NODE |
+------------+------+
| A          | A    |
| A          | B    |
| A          | C    |
| A          | Y    |
| B          | A    |
| B          | B    |
| B          | C    |
| B          | Y    |
| C          | A    |
| C          | B    |
| C          | C    |
| C          | Y    |
| E          | E    |
| E          | F    |
| E          | G    |
| F          | E    |
| F          | F    |
| F          | G    |
| G          | E    |
| G          | F    |
| G          | G    |
| M          | M    |
| M          | N    |
| N          | M    |
| N          | N    |
| Y          | A    |
| Y          | B    |
| Y          | C    |
| Y          | Y    |
+------------+------+
Run Code Online (Sandbox Code Playgroud)

最终选择

现在我们需要Node为每个AnchorNode. LISTAGG可以。 DENSE_RANK()计算GroupID每个的数字AnchorNode


效率

您有相当大的表,因此尝试在上面的单个查询中一次查找所有组可能效率很低。

提高效率的一种方法是不对整个数据集使用单个查询。不要尝试同时查找所有子网/组。将起点限制为一个节点。添加WHERE CTE_Nodes.Node = 'some node'到 的第一部分CTE_Recursive。查询将找到一个子网的所有节点。从大表中删除这些找到的节点,选择另一个起始节点,循环重复直到大表为空。如果您实现CTE_NodesCTE_Pairs作为带有索引的临时表,它也可能有所帮助。

我从来没有使用过 Oracle,也不知道它的程序代码语法,所以我将在下面写一个伪代码。

准备临时表

CREATE TABLE Nodes AS
(
    SELECT from_node AS Node
    FROM link_tbl

    UNION

    SELECT to_node AS Node
    FROM link_tbl
);
CREATE INDEX IX_Node ON Nodes (Node);

CREATE TABLE Pairs AS
(
    SELECT from_node AS Node1, to_node AS Node2
    FROM link_tbl
    WHERE from_node <> to_node

    UNION

    SELECT to_node AS Node1, from_node AS Node2
    FROM link_tbl
    WHERE from_node <> to_node
);
CREATE INDEX IX_Node1 ON Pairs (Node1);
CREATE INDEX IX_Node2 ON Pairs (Node2);

CREATE TABLE Subgraph AS
(
    SELECT Node FROM Nodes WHERE 1=0
);

CREATE TABLE Result
(
    GroupID int NOT NULL,
    Node varchar(10) NOT NULL
);
Run Code Online (Sandbox Code Playgroud)

SET :GroupID = 0;
Run Code Online (Sandbox Code Playgroud)

主循环开始

Node从 中挑选一个Nodes。任何行都可以。保存Node到变量中。同样,我不知道它的正确 Oracle 语法。

SELECT :N = Node FROM Nodes WHERE rownum=1;
Run Code Online (Sandbox Code Playgroud)

如果Nodes为空,则停止循环。

SET :GroupID = :GroupID + 1;
Run Code Online (Sandbox Code Playgroud)

运行递归查询,从Node上面选择的一个特定开始递归。

INSERT INTO Subgraph (Node)
WITH
CTE_Recursive (AnchorNode, Node1, Node2, NodePath, Lvl)
AS
(
    SELECT
        CAST(Nodes.Node AS varchar(2000)) AS AnchorNode
        , Node1
        , Node2
        , CAST(',' || Node1 || ',' || Node2 || ',' AS varchar(2000)) AS NodePath
        , 1 AS Lvl
    FROM 
        Pairs
        INNER JOIN Nodes ON Nodes.Node = Pairs.Node1
    WHERE
        Nodes.Node = :N  -- 'A'
        -- use variable here, don't know what the syntax is

    UNION ALL

    SELECT 
        CTE_Recursive.AnchorNode
        , Pairs.Node1
        , Pairs.Node2
        , CAST(CTE_Recursive.NodePath || Pairs.Node2 || ',' AS varchar(2000)) AS NodePath
        , CTE_Recursive.Lvl + 1 AS Lvl
    FROM
        Pairs
        INNER JOIN CTE_Recursive ON CTE_Recursive.Node2 = Pairs.Node1
    WHERE
        CTE_Recursive.NodePath NOT LIKE CAST('%,' || Pairs.Node2 || ',%' AS varchar(2000))
)
,CTE_Result
AS
(
    SELECT Node1 AS Node
    FROM CTE_Recursive

    UNION

    SELECT Node2 AS Node
    FROM CTE_Recursive
)
SELECT Node FROM CTE_Result
;
Run Code Online (Sandbox Code Playgroud)

此查询将返回连接到给定起始节点的所有节点,即那些形成子图的节点。将其结果集保存到Subgraph临时表中。

将结果附加到最终Result表中,并为找到的子图分配一个 ID。

INSERT INTO Result (GroupID, Node)
SELECT :GroupID, Node
FROM Subgraph;
Run Code Online (Sandbox Code Playgroud)

Nodes和 中删除已处理的节点Pairs

DELETE FROM Nodes
WHERE Node IN (SELECT Node FROM Subgraph)
;

DELETE FROM Pairs
WHERE 
    Node1 IN (SELECT Node FROM Subgraph)
    OR
    Node2 IN (SELECT Node FROM Subgraph)
;
Run Code Online (Sandbox Code Playgroud)

清洁Subgraph桌子

DELETE FROM Subgraph; 
Run Code Online (Sandbox Code Playgroud)

返回到循环的开始。

在最终Result表中将包含具有子图相应 ID 的所有节点。


实际上,您可以进一步简化它。你不需要Nodes桌子,Pairs就足够了。