绘制由 NetworkX Girvan-Newman 算法找到的社区的树状图

Gio*_*oni 5 python matplotlib dendrogram scipy networkx

用于网络社区检测的 Girvan-Newman 算法:

通过逐步从原始图中移除边来检测社区。该算法会在每一步删除“最有价值”的边缘,传统上是具有最高中介中心性的边缘。当图表分解成碎片时,紧密结合的社区结构暴露出来,结果可以用树状图来描述。

在 NetworkX 中,实现返回集合元组上的迭代器。第一个元组是由 2 个社区组成的第一个切割,第二个元组是由 3 个社区组成的第二个切割,依此类推,直到最后一个元组具有 n 个单独节点(树状图的叶子)的 n 个集合。

import networkx as nx

G = nx.path_graph(10)
comp = nx.community.girvan_newman(G)
list(comp)
Run Code Online (Sandbox Code Playgroud)

[({0, 1, 2, 3, 4}, {5, 6, 7, 8, 9}), ({0, 1}, {2, 3, 4}, {5, 6, 7, 8 , 9}), ({0, 1}, {2, 3, 4}, {5, 6}, {8, 9, 7}), ({0, 1}, {2}, {3, 4 }, {5, 6}, {8, 9, 7}), ({0, 1}, {2}, {3, 4}, {5, 6}, {7}, {8, 9}) , ({0}, {1}, {2}, {3, 4}, {5, 6}, {7}, {8, 9}), ({0}, {1}, {2}, {3}, {4}, {5, 6}, {7}, {8, 9}), ({0}, {1}, {2}, {3}, {4}, {5}, {6}, {7}, {8, 9}), ({0}, {1}, {2}, {3}, {4}, {5}, {6}, {7}, {8 }, {9})]

问题是:如何绘制这个树状图?

我已经看过了,scipy.cluster.hierarchy.dendrogram但它需要一个我猜的“链接矩阵”,例如由创建的那个scipy.cluster.hierarchy.linkage,但我不确定如何将这个元组列表转换为这个“链接矩阵”。

所以我问如何在有/没有 SciPy's 帮助的情况下绘制这个树状图dendrogram

Gio*_*oni 7

在@ItamarMushkin之后,我按照@mdml的答案进行了轻微的修改,并得到了我想要的。在高层次上,我将 NetworkX 的 Girvan-Newman 迭代器输出转换为另一个DiGraph()我最终希望看到的树状图。然后,我构建Z一个连接矩阵scipy.cluster.hierarchy.dendrogram,以边缘列表的形式输入到 ,其中包括每个树状图合并的实际高度。

我必须对 @mdml 的答案进行两处修改:

  1. 没那么重要:我对输入的节点的元组键进行排序index
  2. 更重要的是:我添加了一个get_merge_height函数,该函数根据边缘去除的 Girvan-Newman 输出顺序为每个合并提供唯一的高度。否则,两个节点的所有合并在树状图中将具有相同的高度,合并两个节点的下一级中的所有合并和另一个节点将具有相同的高度,等等。

我知道这里可能有一些冗余迭代,我还没有考虑过优化。

import networkx as nx
from itertools import chain, combinations
import matplotlib.pyplot as plt
from scipy.cluster.hierarchy import dendrogram

# get simulated Graph() and Girvan-Newman communities list
G = nx.path_graph(10)
communities = list(nx.community.girvan_newman(G))

# building initial dict of node_id to each possible subset:
node_id = 0
init_node2community_dict = {node_id: communities[0][0].union(communities[0][1])}
for comm in communities:
    for subset in list(comm):
        if subset not in init_node2community_dict.values():
            node_id += 1
            init_node2community_dict[node_id] = subset

# turning this dictionary to the desired format in @mdml's answer
node_id_to_children = {e: [] for e in init_node2community_dict.keys()}
for node_id1, node_id2 in combinations(init_node2community_dict.keys(), 2):
    for node_id_parent, group in init_node2community_dict.items():
        if len(init_node2community_dict[node_id1].intersection(init_node2community_dict[node_id2])) == 0 and group == init_node2community_dict[node_id1].union(init_node2community_dict[node_id2]):
            node_id_to_children[node_id_parent].append(node_id1)
            node_id_to_children[node_id_parent].append(node_id2)

# also recording node_labels dict for the correct label for dendrogram leaves
node_labels = dict()
for node_id, group in init_node2community_dict.items():
    if len(group) == 1:
        node_labels[node_id] = list(group)[0]
    else:
        node_labels[node_id] = ''

# also needing a subset to rank dict to later know within all k-length merges which came first
subset_rank_dict = dict()
rank = 0
for e in communities[::-1]:
    for p in list(e):
        if tuple(p) not in subset_rank_dict:
            subset_rank_dict[tuple(sorted(p))] = rank
            rank += 1
subset_rank_dict[tuple(sorted(chain.from_iterable(communities[-1])))] = rank

# my function to get a merge height so that it is unique (probably not that efficient)
def get_merge_height(sub):
    sub_tuple = tuple(sorted([node_labels[i] for i in sub]))
    n = len(sub_tuple)
    other_same_len_merges = {k: v for k, v in subset_rank_dict.items() if len(k) == n}
    min_rank, max_rank = min(other_same_len_merges.values()), max(other_same_len_merges.values())
    range = (max_rank-min_rank) if max_rank > min_rank else 1
    return float(len(sub)) + 0.8 * (subset_rank_dict[sub_tuple] - min_rank) / range

# finally using @mdml's magic, slightly modified:
G           = nx.DiGraph(node_id_to_children)
nodes       = G.nodes()
leaves      = set( n for n in nodes if G.out_degree(n) == 0 )
inner_nodes = [ n for n in nodes if G.out_degree(n) > 0 ]

# Compute the size of each subtree
subtree = dict( (n, [n]) for n in leaves )
for u in inner_nodes:
    children = set()
    node_list = list(node_id_to_children[u])
    while len(node_list) > 0:
        v = node_list.pop(0)
        children.add( v )
        node_list += node_id_to_children[v]
    subtree[u] = sorted(children & leaves)

inner_nodes.sort(key=lambda n: len(subtree[n])) # <-- order inner nodes ascending by subtree size, root is last

# Construct the linkage matrix
leaves = sorted(leaves)
index  = dict( (tuple([n]), i) for i, n in enumerate(leaves) )
Z = []
k = len(leaves)
for i, n in enumerate(inner_nodes):
    children = node_id_to_children[n]
    x = children[0]
    for y in children[1:]:
        z = tuple(sorted(subtree[x] + subtree[y]))
        i, j = index[tuple(sorted(subtree[x]))], index[tuple(sorted(subtree[y]))]
        Z.append([i, j, get_merge_height(subtree[n]), len(z)]) # <-- float is required by the dendrogram function
        index[z] = k
        subtree[z] = list(z)
        x = z
        k += 1

# dendrogram
plt.figure()
dendrogram(Z, labels=[node_labels[node_id] for node_id in leaves])
plt.savefig('dendrogram.png')
Run Code Online (Sandbox Code Playgroud)

在此输入图像描述