EmJ*_*EmJ 3 python graph networkx pandas
我正在使用以下代码将我的列表列表转换为共现矩阵。
lst = [
['a', 'b'],
['b', 'c', 'd', 'e'],
['a', 'd'],
['b', 'e']
]
u = (pd.get_dummies(pd.DataFrame(lst), prefix='', prefix_sep='')
.groupby(level=0, axis=1)
.sum())
v = u.T.dot(u)
v.values[(np.r_[:len(v)], ) * 2] = 0
print(v)
Run Code Online (Sandbox Code Playgroud)
我的输出如下:
a b c d e
a 0 1 0 1 0
b 1 0 1 1 2
c 0 1 0 1 1
d 1 1 1 0 1
e 0 2 1 1 0
Run Code Online (Sandbox Code Playgroud)
我想将我的共现矩阵转换为weighted undirected
networkx 图,其中weights
表示矩阵中的共现计数。
目前,我已尝试按以下方式进行。但是,我不确定如何在图表中插入权重。
print("get x and y pairs")
#get (x,y) pairs from the cooccurrence matrix
arr = np.where(v>=1)
corrs = [(v.index[x], v.columns[y]) for x, y in zip(*arr)]
#get the unique pairs
final_arr = []
for x, y in corrs:
if (y,x) not in final_arr:
final_arr.append((x,y))
#construct the graph
G = nx.Graph()
nodes_vocabulary_list = ['a', 'b', 'c', 'd', 'e']
G.add_nodes_from(nodes_vocabulary_list)
G.add_edges_from(final_arr)
Run Code Online (Sandbox Code Playgroud)
我想知道是否有更简单的方法来做到这一点networkx
?
如果需要,我很乐意提供更多详细信息。
我相信你可以使用:
lst = [
['a', 'b'],
['b', 'c', 'd', 'e'],
['a', 'd'],
['b', 'e']
]
u = pd.get_dummies(pd.DataFrame(lst), prefix='', prefix_sep='').sum(level=0, axis=1)
v = u.T.dot(u)
#set 0 to lower triangular matrix
v.values[np.tril(np.ones(v.shape)).astype(np.bool)] = 0
print(v)
a b c d e
a 0 1 0 1 0
b 0 0 1 1 2
c 0 0 0 1 1
d 0 0 0 0 1
e 0 0 0 0 0
#reshape and filter only count > 0
a = v.stack()
a = a[a >= 1].rename_axis(('source', 'target')).reset_index(name='weight')
print(a)
source target weight
0 a b 1
1 a d 1
2 b c 1
3 b d 1
4 b e 2
5 c d 1
6 c e 1
7 d e 1
Run Code Online (Sandbox Code Playgroud)
创建图表 from_pandas_edgelist
import networkx as nx
G = nx.from_pandas_edgelist(a, edge_attr=True)
print (nx.to_dict_of_dicts(G))
{'a': {'b': {'weight': 1}, 'd': {'weight': 1}},
'b': {'a': {'weight': 1}, 'c': {'weight': 1}, 'd': {'weight': 1}, 'e': {'weight': 2}},
'd': {'a': {'weight': 1}, 'b': {'weight': 1}, 'c': {'weight': 1}, 'e': {'weight': 1}},
'c': {'b': {'weight': 1}, 'd': {'weight': 1}, 'e': {'weight': 1}},
'e': {'b': {'weight': 2}, 'c': {'weight': 1}, 'd': {'weight': 1}}}
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
1058 次 |
最近记录: |