Oia*_*ale 5 python numpy graph adjacency-matrix
我在处理图形数据时面临以下子任务:
我需要为具有来自“边界”索引数组的多个完全连接组件的图构造 COO 格式的图连接矩阵。
举个例子,给定数组
borders = [0, 2, 5]
Run Code Online (Sandbox Code Playgroud)
由此产生的 COO 矩阵应该是
coo_matrix = [[0, 0, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4],
[0, 1, 0, 1, 2, 3, 4, 2, 3, 4, 2, 3, 4]].
Run Code Online (Sandbox Code Playgroud)
也就是说,borders数组包含应形成完全连接的子图的节点范围(包括起始索引,排除结束索引)。
我想出了以下算法,但我怀疑性能可以提高:
import numpy as np
def get_coo(borders):
edge_list = []
for s, e in zip(borders, borders[1:]):
# create fully-connected subgraph
arr = np.arange(s, e)
t = np.array(np.meshgrid(arr, arr)).T.reshape(-1, 2)
t = t.T
edge_list.append(t)
edge_list = np.concatenate(edge_list, axis=1)
return edge_list
Run Code Online (Sandbox Code Playgroud)
我觉得可能有一个更快的解决方案,也许使用一些 numpy 矢量化运算。
有人有什么想法吗?
由于您的目标是比您现有的解决方案更快,因此您可以探索itertools
如何有效地解决此问题。在较大列表上进行测试时,此方法的基准测试速度比当前方法快大约 25 倍border
。
import numpy as np\nfrom itertools import product, chain\n\ndef get_coo(borders):\n edges = chain(*[product(range(*i),repeat=2) for i in zip(borders, borders[1:])])\n return list(edges)\n\noutput = get_coo(borders)\n\n## NOTE: Remember to can convert to array and Transpose for all approaches below to get Coo format.\nnp.array(output).T\n
Run Code Online (Sandbox Code Playgroud)\narray([[0, 0, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4],\n [0, 1, 0, 1, 2, 3, 4, 2, 3, 4, 2, 3, 4]])\n
Run Code Online (Sandbox Code Playgroud)\n\n\n注意:这些已在您当前的小列表以及由 生成的较大边框列表上进行了基准测试
\nborders = np.arange(300)[np.random.randint(0,2,(300,),dtype=bool)]
您想要做的本质上是组合不相交的完整图。此类图的邻接矩阵具有沿其对角线选择性项目的完整连接。你可以用networkx
它来解决这些问题。
虽然比当前的解决方案慢,但您会发现处理这些图形对象比使用 NumPy 表示图形要容易得多,也更有价值。
\n方法一:
\ni
i*i
nx.disjoint_union_all
import numpy as np\nimport networkx as nx\n\ndef get_coo(borders):\n graphs = [nx.from_numpy_matrix(np.ones((i,i))).to_directed() for i in np.diff(borders)]\n edges = nx.disjoint_union_all(graphs).edges()\n return edges\n\n%timeit get_coo(borders)\n\n#Small- 277 \xc2\xb5s \xc2\xb1 33.5 \xc2\xb5s per loop (mean \xc2\xb1 std. dev. of 7 runs, 1000 loops each)\n#Large- 300 ms \xc2\xb1 36.1 ms per loop (mean \xc2\xb1 std. dev. of 7 runs, 1 loop each)\n
Run Code Online (Sandbox Code Playgroud)\n方法2:
\nborders
使用的滚动 2 克元组zip
nx.complete_graph
使用ranges(start, end)
这些元组创建nx.disjoint_union_all
import numpy as np\nimport networkx as nx\n\ndef get_coo(borders):\n graphs = [nx.complete_graph(range(*i),nx.DiGraph()) for i in zip(borders, borders[1:])]\n edges = nx.disjoint_union_all(graphs).edges()\n return edges\n\n%timeit get_coo(borders)\n\n#Small- 116 \xc2\xb5s \xc2\xb1 13.4 \xc2\xb5s per loop (mean \xc2\xb1 std. dev. of 7 runs, 10000 loops each)\n#Large- 207 ms \xc2\xb1 35.5 ms per loop (mean \xc2\xb1 std. dev. of 7 runs, 1 loop each)\n
Run Code Online (Sandbox Code Playgroud)\n输出比以前快一点,但缺少节点上的自循环,必须单独添加
\nitertools.product
方法三:
\nborders
使用的滚动 2 克元组zip
itertools.product
对每个子图使用完全连接的边列表itertools.chain
“附加”两个迭代器import numpy as np\nfrom itertools import product, chain\n\ndef get_coo(borders):\n edges = chain(*[product(range(*i),repeat=2) for i in zip(borders, borders[1:])])\n return list(edges)\n\n%timeit get_coo(borders)\n\n#Small- 3.91 \xc2\xb5s \xc2\xb1 787 ns per loop (mean \xc2\xb1 std. dev. of 7 runs, 100000 loops each)\n#Large- 183 \xc2\xb5s \xc2\xb1 21.7 \xc2\xb5s per loop (mean \xc2\xb1 std. dev. of 7 runs, 1000 loops each)\n
Run Code Online (Sandbox Code Playgroud)\n此方法比当前方法快大约 25 倍
\ndef get_coo(borders):\n\n edge_list = []\n for s, e in zip(borders, borders[1:]):\n \n # create fully-connected subgraph\n arr = np.arange(s, e)\n t = np.array(np.meshgrid(arr, arr)).T.reshape(-1, 2)\n t = t.T\n\n edge_list.append(t)\n\n edge_list = np.concatenate(edge_list, axis=1)\n\n return edge_list\n\n%timeit get_coo(borders)\n\n#Small- 95.1 \xc2\xb5s \xc2\xb1 10.8 \xc2\xb5s per loop (mean \xc2\xb1 std. dev. of 7 runs, 10000 loops each)\n#Large- 3.91 ms \xc2\xb1 962 \xc2\xb5s per loop (mean \xc2\xb1 std. dev. of 7 runs, 100 loops each)\n
Run Code Online (Sandbox Code Playgroud)\n
归档时间: |
|
查看次数: |
2569 次 |
最近记录: |