Tri*_*son 12 python data-mining networkx
我正在比较同构的一大组networkx图,其中大多数图不应该是同构的(例如,假设0-20%与列表中的某些东西是同构的).
我尝试了以下方法.
graphs = [] # A list of networkx graphs
unique = [] # A list of unique graphs
for new in graphs:
for old in unique:
if nx.is_isomorphic(new, old[0]):
break
else:
unique.append([new])
Run Code Online (Sandbox Code Playgroud)
这让我得到了一个更快的缩小集,但我仍然发现它太慢而不适合理想使用.是否有一些更快的算法来处理这类问题(比较传递交换属性对)或将此算法扩展到多核设置(在20核机器上运行)的方法.
我已经过滤这些集合基于节点/边的数量数据,我们可以假设nx.is_isomorphic功能不能进行任何过滤类型的操作速度更快.我现在也无法轻松更改工具,因此使用编译包不是一种选择.
附加信息:
图形倾向于大约16-20个节点,总共24-48个边缘,存在大量互连,因此每个节点具有大约8个边缘.每个边缘也都有标记,但是只使用了2-3种边缘.
你可以使用nauty(http://users.cecs.anu.edu.au/~bdm/nauty/,可在linux发行版中找到)吗?这有一个规范的标签算法,速度快,可能适用于您的问题.规范标记使同构图相同(经典化).例如,使用来自一组随机图的graph6格式输出给出以下同构图的计数
$ cat g6.py
import networkx as nx
for i in range(100000):
print(nx.generate_graph6(nx.fast_gnp_random_graph(4,0.2),header=False))
$ python g6.py |nauty-labelg |sort |uniq -c
>A labelg
>Z 100000 graphs labelled from stdin to stdout in 0.21 sec.
4898 C`
167 C^
10 C~
26408 C?
39392 C@
19684 CB
1575 CF
1608 CJ
1170 CN
288 Cr
4800 CR
Run Code Online (Sandbox Code Playgroud)
那些是4个节点的11个图 -
$ cat atlas.py
import networkx as nx
for g in nx.atlas.graph_atlas_g()[8:19]:
print(nx.generate_graph6(g,header=False))
$ python atlas.py |nauty-labelg |sort |uniq -c
>A labelg
>Z 11 graphs labelled from stdin to stdout in 0.00 sec.
1 C`
1 C^
1 C~
1 C?
1 C@
1 CB
1 CF
1 CJ
1 CN
1 Cr
1 CR
Run Code Online (Sandbox Code Playgroud)
如果它运行得太慢,那么并行化这种方法将非常容易.
正如其他人所提到的,如果你想留在Python + Networkx中,你可以could_be_isomorphic用来过滤你的图形.
问题是这种方法需要2个图作为输入,而不是数百万.如果用这种方法比较每对图形,则需要很长时间.
查看源代码could_be_isomorphic,它比较两个图的团队,三角和clique序列的数量.如果它们不相等,则图形不能是同构的.
您可以在功能中打包此指纹,根据此指纹对图表进行排序并将其分组itertools.groupby.将会有绝大多数单独的图表.然后可以检查具有相同指纹的少数图形的同构.
使用100 000个随机图表列表:
many_graphs = [nx.fast_gnp_random_graph(random.randint(16, 22), 0.2) for _ in range(100000)]
Run Code Online (Sandbox Code Playgroud)
至少有2个图表共享了500个指纹.如果添加边缘类型信息,则会出现更少的常见指纹.
这是一个包含3000个图形的示例,每个图形包含10到14个节点:
import networkx as nx
from itertools import groupby
import random
many_graphs = [nx.fast_gnp_random_graph(
random.randint(10, 14), 0.3) for _ in range(3000)]
def graph_fingerprint(g):
order = g.order()
d = g.degree()
t = nx.triangles(g)
c = nx.number_of_cliques(g)
props = [[d[v], t[v], c[v]] for v in d]
props.sort()
# TODO: Add count of edge types.
return(props)
sorted_graphs = sorted(many_graphs, key=graph_fingerprint)
for f, g in groupby(sorted_graphs, key=graph_fingerprint):
similar_graphs = list(g)
n = len(similar_graphs)
if n > 1:
print("Found %d graphs which could be isomorphic." % n)
for i in range(n):
for j in range(i + 1, n):
g1, g2 = similar_graphs[i], similar_graphs[j]
if g1 != g2 and nx.is_isomorphic(g1, g2):
print(" %s and %s are isomorphic!" %
(nx.generate_graph6(g1,header=False), nx.generate_graph6(g2,header=False)))
Run Code Online (Sandbox Code Playgroud)
它在不到1s内找到4个同构对:
Found 2 graphs which could be isomorphic.
Found 3 graphs which could be isomorphic.
Id?OGG_C? and IaKO_@?@? are isomorphic!
Found 6 graphs which could be isomorphic.
I?OWcGG?G and I?OCSa?@_ are isomorphic!
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
I_uI???JG and II??QDNA? are isomorphic!
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 3 graphs which could be isomorphic.
Found 3 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 3 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 3 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
IDOCCY@GG and IOGC@`dS? are isomorphic!
Found 3 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 3 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 3 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 3 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Run Code Online (Sandbox Code Playgroud)
这是最后两个同构图."IDOCCY @ GG":
和"IOGC @\dS?":
这是2个具有相同指纹但不同构的图表:
指纹识别可以并行完成.排序和分组必须在1个CPU上进行,但每个组的同构检查可以在不同的CPU中完成.
| 归档时间: |
|
| 查看次数: |
1010 次 |
| 最近记录: |