从集合中查找断开图的算法

Jun*_* su 4 python algorithm graph

目标:希望从大量集合中有效地查找所有断开连接的图形

例如,我有一个如下所示的数据文件:

A, B, C
C, D, E
A, F, Z
G, J
...
Run Code Online (Sandbox Code Playgroud)

每个条目代表一组元素.第一个条目A,B,C = {A,B,C}这也表明A和B,A和C,B和C之间存在边缘.

我最初提出的算法如下

1.parse all the entries into a list:
[
{A,B,C}
{C,D,E}
...
]
2.start with the first element/set of the list can called start_entry, {A,B,C} in this case
3.traverse other element in the list and do the following:
     if the intersection of the element and start_entry is not empty
          start_entry = start_entry union with the element
          remove element from the list
4.with the updated start_entry, traverse the list again until there is not new update
Run Code Online (Sandbox Code Playgroud)

上面的算法应该返回连接图的顶点列表.然而,由于数据集大小,我遇到了运行时问题.有大约100000个条目.所以我只是想知道是否有人知道有更有效的方法来查找连接图.

数据结构也可以改为(如果这更容易)A,B B,C E,F ...每个条目代表图的边缘.

Pet*_*vaz 5

这看起来像是使用不相交集数据结构的理想情况.

这使您可以在几乎线性的时间内连接在一起.

示例Python代码

from collections import defaultdict

data=["A","B","C"],["C","D","E"],["F","G"]

# Prepare mapping from data element to index
S = {}
for a in data:
    for x in a:
        if x not in S:
            S[x] = len(S)

N = len(S)
rank=[0]*N
parent=range(N)

def Find(x):
    """Find representative of connected component"""
    if  parent[x] != x:
        parent[x] = Find(parent[x])
    return parent[x]

def Union(x,y):
    """Merge sets containing elements x and y"""
    x = Find(x)
    y = Find(y)
    if x == y:
        return
    if rank[x]<rank[y]:
        parent[x] = y
    elif rank[x]>rank[y]:
        parent[y] = x
    else:
        parent[y] = x
        rank[x] += 1

# Merge all sets
for a in data:
    x = a[0]
    for y in a[1:]:
        Union(S[x],S[y])

# Report disconnected graphs
V=defaultdict(list)
for x in S:
    V[Find(S[x])].append(x)

print V.values()
Run Code Online (Sandbox Code Playgroud)

版画

[['A', 'C', 'B', 'E', 'D'], ['G', 'F']]
Run Code Online (Sandbox Code Playgroud)