如何将共现矩阵转换为networkx图

EmJ*_*EmJ 3 python graph networkx pandas

我正在使用以下代码将我的列表列表转换为共现矩阵。

lst = [
    ['a', 'b'],
    ['b', 'c', 'd', 'e'],
    ['a', 'd'],
    ['b', 'e']
]

u = (pd.get_dummies(pd.DataFrame(lst), prefix='', prefix_sep='')
       .groupby(level=0, axis=1)
       .sum())

v = u.T.dot(u)
v.values[(np.r_[:len(v)], ) * 2] = 0

print(v)
Run Code Online (Sandbox Code Playgroud)

我的输出如下:

   a  b  c  d  e
a  0  1  0  1  0
b  1  0  1  1  2
c  0  1  0  1  1
d  1  1  1  0  1
e  0  2  1  1  0
Run Code Online (Sandbox Code Playgroud)

我想将我的共现矩阵转换为weighted undirectednetworkx 图,其中weights表示矩阵中的共现计数。

目前,我已尝试按以下方式进行。但是,我不确定如何在图表中插入权重。

print("get x and y pairs")
#get (x,y) pairs from the cooccurrence matrix
arr = np.where(v>=1)
corrs = [(v.index[x], v.columns[y]) for x, y in zip(*arr)]

#get the unique pairs
final_arr = []

for x, y in corrs:
    if (y,x) not in final_arr:
        final_arr.append((x,y))

#construct the graph
G = nx.Graph()
nodes_vocabulary_list = ['a', 'b', 'c', 'd', 'e']
G.add_nodes_from(nodes_vocabulary_list)
G.add_edges_from(final_arr)
Run Code Online (Sandbox Code Playgroud)

我想知道是否有更简单的方法来做到这一点networkx

如果需要,我很乐意提供更多详细信息。

jez*_*ael 5

我相信你可以使用:

lst = [
    ['a', 'b'],
    ['b', 'c', 'd', 'e'],
    ['a', 'd'],
    ['b', 'e']
]

u = pd.get_dummies(pd.DataFrame(lst), prefix='', prefix_sep='').sum(level=0, axis=1)

v = u.T.dot(u)
#set 0 to lower triangular matrix
v.values[np.tril(np.ones(v.shape)).astype(np.bool)] = 0
print(v)
   a  b  c  d  e
a  0  1  0  1  0
b  0  0  1  1  2
c  0  0  0  1  1
d  0  0  0  0  1
e  0  0  0  0  0

#reshape and filter only count > 0
a = v.stack()
a = a[a >= 1].rename_axis(('source', 'target')).reset_index(name='weight')
print(a)
  source target  weight
0      a      b       1
1      a      d       1
2      b      c       1
3      b      d       1
4      b      e       2
5      c      d       1
6      c      e       1
7      d      e       1
Run Code Online (Sandbox Code Playgroud)

创建图表 from_pandas_edgelist

import networkx as nx
G = nx.from_pandas_edgelist(a,  edge_attr=True)

print (nx.to_dict_of_dicts(G))
{'a': {'b': {'weight': 1}, 'd': {'weight': 1}}, 
 'b': {'a': {'weight': 1}, 'c': {'weight': 1}, 'd': {'weight': 1}, 'e': {'weight': 2}}, 
 'd': {'a': {'weight': 1}, 'b': {'weight': 1}, 'c': {'weight': 1}, 'e': {'weight': 1}}, 
 'c': {'b': {'weight': 1}, 'd': {'weight': 1}, 'e': {'weight': 1}}, 
 'e': {'b': {'weight': 2}, 'c': {'weight': 1}, 'd': {'weight': 1}}}
Run Code Online (Sandbox Code Playgroud)