Python:对依赖项列表进行排序

Dis*_*ard 13 python sorting topological-sort

我正在努力研究如果我的问题可以使用内置的sorted()函数解决,或者如果我需要自己做 - 使用cmp的旧学校会相对容易.

我的数据集看起来像:

x = [
('business', Set('fleet','address'))
('device', Set('business','model','status','pack'))
('txn', Set('device','business','operator'))
....

排序规则基本上应该是N和Y的所有值,其中Y> N,x [N] [0]不在x [Y] [1]

虽然我正在使用Python 2.6,其中cmp参数仍然可用,但我正在尝试使这个Python 3安全.

那么,这可以使用一些lambda魔法和关键参数来完成吗?

- ==更新== -

谢谢Eli&Winston!我真的不认为使用钥匙会起作用,或者如果我怀疑它会是一个不太理想的鞋拔解决方案.

因为我的问题是数据库表依赖项,所以我不得不对Eli的代码进行一些小的补充,以从依赖项列表中删除一个项目(在一个设计良好的数据库中,这不会发生,但是谁住在那个神奇的完美世界?)

我的解决方案

def topological_sort(source):
    """perform topo sort on elements.

    :arg source: list of ``(name, set(names of dependancies))`` pairs
    :returns: list of names, with dependancies listed first
    """
    pending = [(name, set(deps)) for name, deps in source]        
    emitted = []
    while pending:
        next_pending = []
        next_emitted = []
        for entry in pending:
            name, deps = entry
            deps.difference_update(set((name,)), emitted) # <-- pop self from dep, req Py2.6
            if deps:
                next_pending.append(entry)
            else:
                yield name
                emitted.append(name) # <-- not required, but preserves original order
                next_emitted.append(name)
        if not next_emitted:
            raise ValueError("cyclic dependancy detected: %s %r" % (name, (next_pending,)))
        pending = next_pending
        emitted = next_emitted
Run Code Online (Sandbox Code Playgroud)

Eli*_*ins 16

你想要的是一种拓扑排序.虽然可以使用内置函数实现sort(),但它相当笨拙,最好直接在python中实现拓扑排序.

为什么会变得尴尬?如果你在wiki页面上研究这两个算法,它们都依赖于一组运行的"标记节点",一个难以扭曲到一个表单中的概念sort()可以使用,因为key=xxx(或者甚至cmp=xxx)最适合无状态比较函数,特别是因为timsort不保证该元素将被检查的顺序.我(美丽的)肯定,任何解决方案确实使用sort()将要结束了冗余计算每个呼叫键/ CMP功能的一些信息,以避开无国籍问题.

以下是我一直在使用的alg(用于排序一些javascript库依赖项):

编辑:基于Winston Ewert的解决方案,对此进行了大量改造

def topological_sort(source):
    """perform topo sort on elements.

    :arg source: list of ``(name, [list of dependancies])`` pairs
    :returns: list of names, with dependancies listed first
    """
    pending = [(name, set(deps)) for name, deps in source] # copy deps so we can modify set in-place       
    emitted = []        
    while pending:
        next_pending = []
        next_emitted = []
        for entry in pending:
            name, deps = entry
            deps.difference_update(emitted) # remove deps we emitted last pass
            if deps: # still has deps? recheck during next pass
                next_pending.append(entry) 
            else: # no more deps? time to emit
                yield name 
                emitted.append(name) # <-- not required, but helps preserve original ordering
                next_emitted.append(name) # remember what we emitted for difference_update() in next pass
        if not next_emitted: # all entries have unmet deps, one of two things is wrong...
            raise ValueError("cyclic or missing dependancy detected: %r" % (next_pending,))
        pending = next_pending
        emitted = next_emitted
Run Code Online (Sandbox Code Playgroud)

旁注:它可能的鞋拔一个cmp()函数成key=xxx,如在本蟒错误跟踪概述消息.


Win*_*ert 6

我做了这样的拓扑排序:

def topological_sort(items):
    provided = set()
    while items:
         remaining_items = []
         emitted = False

         for item, dependencies in items:
             if dependencies.issubset(provided):
                   yield item
                   provided.add(item)
                   emitted = True
             else:
                   remaining_items.append( (item, dependencies) )

         if not emitted:
             raise TopologicalSortFailure()

         items = remaining_items
Run Code Online (Sandbox Code Playgroud)

我认为它比Eli的版本更直接,我不知道效率.


Jon*_*nts 5

看看糟糕的格式和这种奇怪的Set类型......(我把它们保存为元组并正确划分列表项......)...并使用networkx库来方便...

x = [
    ('business', ('fleet','address')),
    ('device', ('business','model','status','pack')),
    ('txn', ('device','business','operator'))
]

import networkx as nx

g = nx.DiGraph()
for key, vals in x:
    for val in vals:
        g.add_edge(key, val)

print nx.topological_sort(g)
Run Code Online (Sandbox Code Playgroud)

  • 关于这个解决方案的一个警 它只适用于依赖关系形成完全连接的图形.如果有节点没有任何依赖关系(因此没有任何边缘到其他节点),它们将不会包含在`topological_sort()`的输出中. (2认同)