Fra*_*ang 25 python algorithm social-networking correlation networkx
在题为" 度量相关性的缩放"及其对无标度网络中的扩散的影响的论文中,作者定义了$ E_b(k)$的数量来衡量度相关程度.
LK Gallos,C.Song和HA Makse,度量相关性的比例及其对无标度网络扩散的影响,物理学.莱特牧师.100,248701(2008).
我的问题是如何使用Python计算网络的Eb(k)?我的问题是我无法重现作者的结果.我使用Condense Matter数据进行测试.Eb(k)的结果如上图所示.您可以看到我的图中的一个问题是Eb(k)远大于1!!我也尝试了互联网(作为级别数据)和WWW数据,问题仍然存在.毫无疑问,我的算法或代码存在严重问题.您可以重现我的结果,并将其与作者进行比较.您的解决方案或建议非常感谢.我将在下面介绍我的算法和python脚本.
python脚本如下:
%matplotlib inline
import networkx as nx
import matplotlib.cm as cm
import matplotlib.pyplot as plt
from collections import defaultdict
def ebks(g, b):
edge_dict = defaultdict(lambda: defaultdict(int))
degree_dict = defaultdict(int)
edge_degree = [sorted(g.degree(e).values()) for e in g.edges()]
for e in edge_degree:
edge_dict[e[0]][e[-1]] +=1
for i in g.degree().values():
degree_dict[i] +=1
edge_number = g.number_of_edges()
node_number = g.number_of_nodes()
ebks, ks = [], []
for k1 in edge_dict:
p1, p2 = 0, 0
for k2 in edge_dict[k1]:
if k2 >= b*k1:
pkk = float(edge_dict[k1][k2])/edge_number
pk2 = float(degree_dict[k2])/node_number
k2pk2 = k2*pk2
p1 += pkk/k2pk2
for k in degree_dict:
if k>=b*k1:
pk = float(degree_dict[k])/node_number
p2 += pk
if p2 > 0:
ebks.append(p1/p2)
ks.append(k1)
return ebks, ks
Run Code Online (Sandbox Code Playgroud)
我使用ca-CondMat数据进行测试,您可以从以下网址下载:http://snap.stanford.edu/data/ca-CondMat.html
# Load the data
# Remember to change the file path to your own
ca = nx.Graph()
with open ('/path-of-your-file/ca-CondMat.txt') as f:
for line in f:
if line[0] != '#':
x, y = line.strip().split('\t')
ca.add_edge(x,y)
nx.info(ca)
#calculate ebk
ebk, k = ebks(ca, b=3)
plt.plot(k,ebk,'r^')
plt.xlabel(r'$k$', fontsize = 16)
plt.ylabel(r'$E_b(k)$', fontsize = 16)
plt.xscale('log')
plt.yscale('log')
plt.show()
Run Code Online (Sandbox Code Playgroud)
更新:问题尚未解决.
def ebkss(g, b, x):
edge_dict = defaultdict(lambda: defaultdict(int))
degree_dict = defaultdict(int)
edge_degree = [sorted(g.degree(e).values()) for e in g.edges()]
for e in edge_degree:
edge_dict[e[0]][e[-1]] +=1
for i in g.degree().values():
degree_dict[i] +=1
edge_number = g.number_of_edges()
node_number = g.number_of_nodes()
ebks, ks = [], []
for k1 in edge_dict:
p1, p2 = 0, 0
nk2k = np.sum(edge_dict[k1].values())
pk1 = float(degree_dict[k1])/node_number
k1pk1 = k1*pk1
for k2 in edge_dict[k1]:
if k2 >= b*k1:
pk2k = float(edge_dict[k1][k2])/nk2k
pk2 = float(degree_dict[k2])/node_number
k2pk2 = k2*pk2
p1 += (pk2k*k1pk1)/k2pk2
for k in degree_dict:
if k>=b*k1:
pk = float(degree_dict[k])/node_number
p2 += pk
if p2 > 0:
ebks.append(p1/p2**x)
ks.append(k1)
return ebks, ks
Run Code Online (Sandbox Code Playgroud)
考虑到使用数据的日志分箱,可以采用以下函数。
import numpy as np
def log_binning(x, y, bin_count=35):
max_x = np.log10(max(x))
max_y = np.log10(max(y))
max_base = max([max_x,max_y])
xx = [i for i in x if i>0]
min_x = np.log10(np.min(xx))
bins = np.logspace(min_x,max_base,num=bin_count)
bin_means_y = (np.histogram(x,bins,weights=y)[0] / np.histogram(x,bins)[0])
bin_means_x = (np.histogram(x,bins,weights=x)[0] / np.histogram(x,bins)[0])
return bin_means_x,bin_means_y
Run Code Online (Sandbox Code Playgroud)
如果您想对数据进行线性分箱,请使用以下函数:
def LinearBinData(x, y, number):
data=sorted(zip(x,y))
rs = np.linspace(min(x),max(x),number)
rs = np.transpose(np.vstack((rs[:-1],rs[1:])))
ndata = []
within = []
for start,end in rs:
for i,j in data:
if i>=start and i<end:
within.append(j)
ndata.append([(start+end)/2.0,np.mean(np.array(within))] )
nx,ny = np.array(ndata).T
return nx,ny
Run Code Online (Sandbox Code Playgroud)
通常,对于缩放关系,对数分箱会是更好的选择。