以 COO 格式构建图连接矩阵

Question

以 COO 格式构建图连接矩阵

Oia*_*ale 5 python numpy graph adjacency-matrix

我在处理图形数据时面临以下子任务：

我需要为具有来自“边界”索引数组的多个完全连接组件的图构造 COO 格式的图连接矩阵。

举个例子，给定数组

borders = [0, 2, 5]

Run Code Online (Sandbox Code Playgroud)

由此产生的 COO 矩阵应该是

coo_matrix = [[0, 0, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4],
              [0, 1, 0, 1, 2, 3, 4, 2, 3, 4, 2, 3, 4]].

Run Code Online (Sandbox Code Playgroud)

也就是说，borders数组包含应形成完全连接的子图的节点范围（包括起始索引，排除结束索引）。

我想出了以下算法，但我怀疑性能可以提高：

import numpy as np

def get_coo(borders):

    edge_list = []
    for s, e in zip(borders, borders[1:]):
        
        # create fully-connected subgraph
        arr = np.arange(s, e)
        t = np.array(np.meshgrid(arr, arr)).T.reshape(-1, 2)
        t = t.T

        edge_list.append(t)

    edge_list = np.concatenate(edge_list, axis=1)

    return edge_list

Run Code Online (Sandbox Code Playgroud)

我觉得可能有一个更快的解决方案，也许使用一些 numpy 矢量化运算。

有人有什么想法吗？

Answer 1

Aks*_*gal 4

由于您的目标是比您现有的解决方案更快，因此您可以探索itertools如何有效地解决此问题。在较大列表上进行测试时，此方法的基准测试速度比当前方法快大约 25 倍border。

\n

import numpy as np\nfrom itertools import product, chain\n\ndef get_coo(borders):\n    edges = chain(*[product(range(*i),repeat=2) for i in zip(borders, borders[1:])])\n    return list(edges)\n\noutput = get_coo(borders)\n\n## NOTE: Remember to can convert to array and Transpose for all approaches below to get Coo format.\nnp.array(output).T\n

Run Code Online (Sandbox Code Playgroud)\n

array([[0, 0, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4],\n       [0, 1, 0, 1, 2, 3, 4, 2, 3, 4, 2, 3, 4]])\n

Run Code Online (Sandbox Code Playgroud)\n

\n

替代方法和基准：

\n

\n
注意：这些已在您当前的小列表以及由生成的较大边框列表上进行了基准测试borders = np.arange(300)[np.random.randint(0,2,(300,),dtype=bool)]
\n

\n

完全图的不相交并集

\n

您想要做的本质上是组合不相交的完整图。此类图的邻接矩阵具有沿其对角线选择性项目的完整连接。你可以用networkx它来解决这些问题。

\n

虽然比当前的解决方案慢，但您会发现处理这些图形对象比使用 NumPy 表示图形要容易得多，也更有价值。

\n

方法一：

\n

假设节点是按顺序排列的，计算每个子图中的节点数为i
创建一个完整的矩阵，填充 1s 的形状i*i
使用组合图nx.disjoint_union_all
获取该图的边

\n

import numpy as np\nimport networkx as nx\n\ndef get_coo(borders):\n    graphs = [nx.from_numpy_matrix(np.ones((i,i))).to_directed() for i in np.diff(borders)]\n    edges = nx.disjoint_union_all(graphs).edges()\n    return edges\n\n%timeit get_coo(borders)\n\n#Small- 277 \xc2\xb5s \xc2\xb1 33.5 \xc2\xb5s per loop (mean \xc2\xb1 std. dev. of 7 runs, 1000 loops each)\n#Large- 300 ms \xc2\xb1 36.1 ms per loop (mean \xc2\xb1 std. dev. of 7 runs, 1 loop each)\n

Run Code Online (Sandbox Code Playgroud)\n

方法2：

\n

迭代borders使用的滚动 2 克元组zip
nx.complete_graph使用ranges(start, end)这些元组创建
使用组合图nx.disjoint_union_all
获取该图的边

\n

import numpy as np\nimport networkx as nx\n\ndef get_coo(borders):\n    graphs = [nx.complete_graph(range(*i),nx.DiGraph()) for i in zip(borders, borders[1:])]\n    edges = nx.disjoint_union_all(graphs).edges()\n    return edges\n\n%timeit get_coo(borders)\n\n#Small- 116 \xc2\xb5s \xc2\xb1 13.4 \xc2\xb5s per loop (mean \xc2\xb1 std. dev. of 7 runs, 10000 loops each)\n#Large- 207 ms \xc2\xb1 35.5 ms per loop (mean \xc2\xb1 std. dev. of 7 runs, 1 loop each)\n

Run Code Online (Sandbox Code Playgroud)\n

输出比以前快一点，但缺少节点上的自循环，必须单独添加

\n

使用`itertools.product`

\n

方法三：

\n

迭代borders使用的滚动 2 克元组zip
itertools.product对每个子图使用完全连接的边列表
用于itertools.chain“附加”两个迭代器
将它们作为边返回

\n

import numpy as np\nfrom itertools import product, chain\n\ndef get_coo(borders):\n    edges = chain(*[product(range(*i),repeat=2) for i in zip(borders, borders[1:])])\n    return list(edges)\n\n%timeit get_coo(borders)\n\n#Small- 3.91 \xc2\xb5s \xc2\xb1 787 ns per loop (mean \xc2\xb1 std. dev. of 7 runs, 100000 loops each)\n#Large- 183 \xc2\xb5s \xc2\xb1 21.7 \xc2\xb5s per loop (mean \xc2\xb1 std. dev. of 7 runs, 1000 loops each)\n

Run Code Online (Sandbox Code Playgroud)\n

此方法比当前方法快大约 25 倍

\n

您当前的方法 - 基准

\n

def get_coo(borders):\n\n    edge_list = []\n    for s, e in zip(borders, borders[1:]):\n        \n        # create fully-connected subgraph\n        arr = np.arange(s, e)\n        t = np.array(np.meshgrid(arr, arr)).T.reshape(-1, 2)\n        t = t.T\n\n        edge_list.append(t)\n\n    edge_list = np.concatenate(edge_list, axis=1)\n\n    return edge_list\n\n%timeit get_coo(borders)\n\n#Small- 95.1 \xc2\xb5s \xc2\xb1 10.8 \xc2\xb5s per loop (mean \xc2\xb1 std. dev. of 7 runs, 10000 loops each)\n#Large- 3.91 ms \xc2\xb1 962 \xc2\xb5s per loop (mean \xc2\xb1 std. dev. of 7 runs, 100 loops each)\n

Run Code Online (Sandbox Code Playgroud)\n

归档时间：	4 年，1 月前
查看次数：	2569 次
最近记录：	4 年，1 月前

以 COO 格式构建图连接矩阵

替代方法和基准：

完全图的不相交并集

使用itertools.product

您当前的方法 - 基准

使用`itertools.product`