一组联合查找算法

Mig*_*Mig 15 python algorithm set

我有数千行1到100个数字,每行定义一组数字和它们之间的关系.我需要得到一组相关的数字.

小例子:如果我有这7行数据

T1 T2
T3 
T4
T5
T6 T1
T5 T4
T3 T4 T7
Run Code Online (Sandbox Code Playgroud)

我需要一个不那么慢的算法来知道这里的集合是:

T1 T2 T6 (because T1 is related with T2 in the first line and T1 related with T6 in the line 5)
T3 T4 T5 T7 (because T5 is with T4 in line 6 and T3 is with T4 and T7 in line 7)
Run Code Online (Sandbox Code Playgroud)

但是当你拥有非常大的集合时,在每个大集合中搜索T(x)都会非常缓慢,并且需要集合等等......

你是否有一种暗示以不那么强力的方式做到这一点?

我试图用Python做到这一点.

Jim*_*wis 14

将数字T1,T2等视为图顶点.在一条线上出现的任何两个数字由边连接.那你的问题就等于找到这个图中所有连接的组件.您可以通过从T1开始,然后执行广度优先或深度优先搜索来查找从该点可到达的所有顶点.将所有这些顶点标记为属于等价类T1.然后找到下一个未标记的顶点Ti,找到从那里可到达的所有尚未标记的节点,并将它们标记为属于等价类Ti.继续,直到标记所有顶点.

对于具有n个顶点和e边的图,该算法需要O(e)时间和空间来构建邻接列表,并且O(n)时间和空间用于在构建图结构后识别所有连接的组件.

  • 有像disjoint set这样的结构,您可以使用它们来构建连接的组件.(见问题的标题!) (3认同)

Joh*_*hin 14

一旦构建了数据结构,您想要针对它运行哪些查询?向我们展示您现有的代码.什么是T(x)?你谈的是"数字组",但你的样本数据显示T1,T2等; 请解释.

你读过这个:http://en.wikipedia.org/wiki/Disjoint-set_data_structure

试着看看这个Python实现:http://code.activestate.com/recipes/215912-union-find-data-structure/

或者你可以自己抨击一些相当简单易懂的东西,例如

[更新:全新代码]

class DisjointSet(object):

    def __init__(self):
        self.leader = {} # maps a member to the group's leader
        self.group = {} # maps a group leader to the group (which is a set)

    def add(self, a, b):
        leadera = self.leader.get(a)
        leaderb = self.leader.get(b)
        if leadera is not None:
            if leaderb is not None:
                if leadera == leaderb: return # nothing to do
                groupa = self.group[leadera]
                groupb = self.group[leaderb]
                if len(groupa) < len(groupb):
                    a, leadera, groupa, b, leaderb, groupb = b, leaderb, groupb, a, leadera, groupa
                groupa |= groupb
                del self.group[leaderb]
                for k in groupb:
                    self.leader[k] = leadera
            else:
                self.group[leadera].add(b)
                self.leader[b] = leadera
        else:
            if leaderb is not None:
                self.group[leaderb].add(a)
                self.leader[a] = leaderb
            else:
                self.leader[a] = self.leader[b] = a
                self.group[a] = set([a, b])

data = """T1 T2
T3 T4
T5 T1
T3 T6
T7 T8
T3 T7
T9 TA
T1 T9"""
# data is chosen to demonstrate each of 5 paths in the code
from pprint import pprint as pp
ds = DisjointSet()
for line in data.splitlines():
    x, y = line.split()
    ds.add(x, y)
    print
    print x, y
    pp(ds.leader)
    pp(ds.group)
Run Code Online (Sandbox Code Playgroud)

这是最后一步的输出:

T1 T9
{'T1': 'T1',
 'T2': 'T1',
 'T3': 'T3',
 'T4': 'T3',
 'T5': 'T1',
 'T6': 'T3',
 'T7': 'T3',
 'T8': 'T3',
 'T9': 'T1',
 'TA': 'T1'}
{'T1': set(['T1', 'T2', 'T5', 'T9', 'TA']),
 'T3': set(['T3', 'T4', 'T6', 'T7', 'T8'])}
Run Code Online (Sandbox Code Playgroud)

  • +1 不相交集:用于增量连接组件。 (2认同)

Jam*_*ong 5

您可以使用联合查找数据结构来实现此目标。

这种算法的伪代码如下:

func find( var element )
    while ( element is not the root ) element = element's parent
    return element
end func

func union( var setA, var setB )
    var rootA = find( setA ), rootB = find( setB )
    if ( rootA is equal to rootB ) return
    else
        set rootB as rootA's parent
end func
Run Code Online (Sandbox Code Playgroud)

(摘自http://www.algorithmist.com/index.php/Union_Find