以 COO 格式构建图连接矩阵

Oia*_*ale 5 python numpy graph adjacency-matrix

我在处理图形数据时面临以下子任务:

我需要为具有来自“边界”索引数组的多个完全连接组件的图构造 COO 格式的图连接矩阵。

举个例子,给定数组

borders = [0, 2, 5]
Run Code Online (Sandbox Code Playgroud)

由此产生的 COO 矩阵应该是

coo_matrix = [[0, 0, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4],
              [0, 1, 0, 1, 2, 3, 4, 2, 3, 4, 2, 3, 4]].
Run Code Online (Sandbox Code Playgroud)

也就是说,borders数组包含应形成完全连接的子图的节点范围(包括起始索引,排除结束索引)。

我想出了以下算法,但我怀疑性能可以提高:

import numpy as np

def get_coo(borders):

    edge_list = []
    for s, e in zip(borders, borders[1:]):
        
        # create fully-connected subgraph
        arr = np.arange(s, e)
        t = np.array(np.meshgrid(arr, arr)).T.reshape(-1, 2)
        t = t.T

        edge_list.append(t)

    edge_list = np.concatenate(edge_list, axis=1)

    return edge_list
Run Code Online (Sandbox Code Playgroud)

我觉得可能有一个更快的解决方案,也许使用一些 numpy 矢量化运算。

有人有什么想法吗?

Aks*_*gal 4

由于您的目标是比您现有的解决方案更快,因此您可以探索itertools如何有效地解决此问题。在较大列表上进行测试时,此方法的基准测试速度比当前方法快大约 25 倍border

\n
import numpy as np\nfrom itertools import product, chain\n\ndef get_coo(borders):\n    edges = chain(*[product(range(*i),repeat=2) for i in zip(borders, borders[1:])])\n    return list(edges)\n\noutput = get_coo(borders)\n\n## NOTE: Remember to can convert to array and Transpose for all approaches below to get Coo format.\nnp.array(output).T\n
Run Code Online (Sandbox Code Playgroud)\n
array([[0, 0, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4],\n       [0, 1, 0, 1, 2, 3, 4, 2, 3, 4, 2, 3, 4]])\n
Run Code Online (Sandbox Code Playgroud)\n
\n

替代方法和基准:

\n
\n

注意:这些已在您当前的小列表以及由 生成的较大边框列表上进行了基准测试borders = np.arange(300)[np.random.randint(0,2,(300,),dtype=bool)]

\n
\n

完全图的不相交并集

\n

您想要做的本质上是组合不相交的完整图。此类图的邻接矩阵具有沿其对角线选择性项目的完整连接。你可以用networkx它来解决这些问题。

\n

在此输入图像描述

\n

虽然比当前的解决方案慢,但您会发现处理这些图形对象比使用 NumPy 表示图形要容易得多,也更有价值。

\n

方法一:

\n
    \n
  1. 假设节点是按顺序排列的,计算每个子图中的节点数为i
  2. \n
  3. 创建一个完整的矩阵,填充 1s 的形状i*i
  4. \n
  5. 使用组合图nx.disjoint_union_all
  6. \n
  7. 获取该图的边
  8. \n
\n
import numpy as np\nimport networkx as nx\n\ndef get_coo(borders):\n    graphs = [nx.from_numpy_matrix(np.ones((i,i))).to_directed() for i in np.diff(borders)]\n    edges = nx.disjoint_union_all(graphs).edges()\n    return edges\n\n%timeit get_coo(borders)\n\n#Small- 277 \xc2\xb5s \xc2\xb1 33.5 \xc2\xb5s per loop (mean \xc2\xb1 std. dev. of 7 runs, 1000 loops each)\n#Large- 300 ms \xc2\xb1 36.1 ms per loop (mean \xc2\xb1 std. dev. of 7 runs, 1 loop each)\n
Run Code Online (Sandbox Code Playgroud)\n

方法2:

\n
    \n
  1. 迭代borders使用的滚动 2 克元组zip
  2. \n
  3. nx.complete_graph使用ranges(start, end)这些元组创建
  4. \n
  5. 使用组合图nx.disjoint_union_all
  6. \n
  7. 获取该图的边
  8. \n
\n
import numpy as np\nimport networkx as nx\n\ndef get_coo(borders):\n    graphs = [nx.complete_graph(range(*i),nx.DiGraph()) for i in zip(borders, borders[1:])]\n    edges = nx.disjoint_union_all(graphs).edges()\n    return edges\n\n%timeit get_coo(borders)\n\n#Small- 116 \xc2\xb5s \xc2\xb1 13.4 \xc2\xb5s per loop (mean \xc2\xb1 std. dev. of 7 runs, 10000 loops each)\n#Large- 207 ms \xc2\xb1 35.5 ms per loop (mean \xc2\xb1 std. dev. of 7 runs, 1 loop each)\n
Run Code Online (Sandbox Code Playgroud)\n

输出比以前快一点,但缺少节点上的自循环,必须单独添加

\n

使用itertools.product

\n

方法三:

\n
    \n
  1. 迭代borders使用的滚动 2 克元组zip
  2. \n
  3. itertools.product对每个子图使用完全连接的边列表
  4. \n
  5. 用于itertools.chain“附加”两个迭代器
  6. \n
  7. 将它们作为边返回
  8. \n
\n
import numpy as np\nfrom itertools import product, chain\n\ndef get_coo(borders):\n    edges = chain(*[product(range(*i),repeat=2) for i in zip(borders, borders[1:])])\n    return list(edges)\n\n%timeit get_coo(borders)\n\n#Small- 3.91 \xc2\xb5s \xc2\xb1 787 ns per loop (mean \xc2\xb1 std. dev. of 7 runs, 100000 loops each)\n#Large- 183 \xc2\xb5s \xc2\xb1 21.7 \xc2\xb5s per loop (mean \xc2\xb1 std. dev. of 7 runs, 1000 loops each)\n
Run Code Online (Sandbox Code Playgroud)\n

此方法比当前方法快大约 25 倍

\n

您当前的方法 - 基准

\n
def get_coo(borders):\n\n    edge_list = []\n    for s, e in zip(borders, borders[1:]):\n        \n        # create fully-connected subgraph\n        arr = np.arange(s, e)\n        t = np.array(np.meshgrid(arr, arr)).T.reshape(-1, 2)\n        t = t.T\n\n        edge_list.append(t)\n\n    edge_list = np.concatenate(edge_list, axis=1)\n\n    return edge_list\n\n%timeit get_coo(borders)\n\n#Small- 95.1 \xc2\xb5s \xc2\xb1 10.8 \xc2\xb5s per loop (mean \xc2\xb1 std. dev. of 7 runs, 10000 loops each)\n#Large- 3.91 ms \xc2\xb1 962 \xc2\xb5s per loop (mean \xc2\xb1 std. dev. of 7 runs, 100 loops each)\n
Run Code Online (Sandbox Code Playgroud)\n