dan*_*dan 4 python numpy dataframe pandas
我有一个n x n numpy float64 sparse matrix( data, where n = 44),其中行和列是图形节点,值是边权重:
>>> data
<44x44 sparse matrix of type '<class 'numpy.float64'>'
with 668 stored elements in Compressed Sparse Row format>
>>> type(data)
<class 'scipy.sparse.csr.csr_matrix'>
>>> print(data)
(0, 7) 0.11793236293516568
(0, 9) 0.10992000939300195
(0, 21) 0.7422196678913772
(0, 23) 0.0630039712667936
(0, 24) 0.027037442463504143
(0, 27) 0.16908845414214152
(0, 28) 0.6109227233402952
(0, 32) 0.0514765253537568
(0, 33) 0.016341754080557713
(1, 6) 0.015070325434709386
(1, 10) 9.346673769086203e-05
(1, 11) 0.2471018034781923
(1, 14) 0.0020684269551621776
(1, 18) 0.015258704502643251
(1, 20) 0.021798149289490358
(1, 22) 0.0087026831764125
(1, 24) 0.1454235884185166
(1, 25) 0.022060777594183015
(1, 29) 0.9117391202819067
(1, 30) 0.018557883854566116
(1, 31) 0.001876070225734826
(1, 32) 0.025841354399637764
(1, 33) 0.014766488228364438
(1, 39) 0.002791226433410351
(1, 43) 1.0
: :
(41, 7) 0.8922099840113696
(41, 10) 0.015776226631920767
(41, 12) 1.0
(41, 15) 0.1839408706622038
(41, 18) 0.5151025641025642
(41, 20) 0.4599130036630037
(41, 22) 0.29378473237788827
(41, 33) 0.47474890700697153
(41, 39) 1.0
(42, 2) 1.0
(42, 10) 0.023305789342610222
(42, 11) 0.011349136164776494
(42, 12) 1.0
(42, 17) 0.886081346522542
(42, 18) 1.0
(42, 30) 1.0
(42, 40) 1.0
(43, 1) 1.0
(43, 6) 1.0
(43, 11) 0.039948959300013256
(43, 13) 1.0
(43, 14) 0.02669811947637717
(43, 29) 1.0
(43, 30) 1.0
(43, 36) 0.3381986531986532
Run Code Online (Sandbox Code Playgroud)
我想将其转换为 a pandas data frame,以便将其写入文件,其中包含列:node1, node2, edge_weight,因此将给出:
node1, node2, edge_weight
0, 7, 0.11793236293516568
0, 9, 0.10992000939300195
:, :, :
43, 36, 0.3381986531986532
Run Code Online (Sandbox Code Playgroud)
知道怎么做吗?
注意:
>>> pandas.DataFrame(data)
Run Code Online (Sandbox Code Playgroud)
给出:
0
0 (0, 7)\t0.11793236293516568\n (0, 9)\t0.109...
1 (0, 6)\t0.015070325434709386\n (0, 10)\t9.3...
Run Code Online (Sandbox Code Playgroud)
和
>>> pandas.DataFrame(print(data))
Run Code Online (Sandbox Code Playgroud)
给出:
(0, 7) 0.11793236293516568
(0, 9) 0.10992000939300195
Run Code Online (Sandbox Code Playgroud)
所以我想pandas.DataFrame(print(data))接近我正在寻找的东西。
此 ipython 会话展示了一种实现此目的的方法。两个步骤是:将稀疏矩阵转换为 COO 格式,然后使用COO 矩阵的.row、.col和属性创建 Pandas DataFrame。.data
In [50]: data
Out[50]:
<15x15 sparse matrix of type '<class 'numpy.float64'>'
with 11 stored elements in Compressed Sparse Row format>
In [51]: print(data)
(1, 12) 0.8581958095588134
(6, 12) 0.03828052946099181
(6, 14) 0.7908634838351427
(7, 1) 0.7995008873930302
(7, 11) 0.48477191537121145
(7, 13) 0.6226526443518743
(9, 4) 0.37242576669669103
(11, 1) 0.9604278557580955
(11, 5) 0.13285436036287313
(12, 11) 0.5631419223609928
(13, 8) 0.16481624650723847
In [52]: import pandas as pd
In [53]: c = data.tocoo()
In [54]: df = pd.DataFrame({node1: c.row, node2: c.col, edge_weight: c.data})
In [55]: df
Out[55]:
node1 node2 edge_weight
0 1 12 0.858196
1 6 12 0.038281
2 6 14 0.790863
3 7 1 0.799501
4 7 11 0.484772
5 7 13 0.622653
6 9 4 0.372426
7 11 1 0.960428
8 11 5 0.132854
9 12 11 0.563142
10 13 8 0.164816
Run Code Online (Sandbox Code Playgroud)