use*_*672 11 python numpy scipy sparse-matrix correlation
有谁知道如何从python中的一个非常大的稀疏矩阵计算相关矩阵?基本上,我正在寻找类似于numpy.corrcoefscipy稀疏矩阵的东西.
您可以从协方差矩阵中直接计算相关系数,如下所示:
import numpy as np
from scipy import sparse
def sparse_corrcoef(A, B=None):
if B is not None:
A = sparse.vstack((A, B), format='csr')
A = A.astype(np.float64)
n = A.shape[1]
# Compute the covariance matrix
rowsum = A.sum(1)
centering = rowsum.dot(rowsum.T.conjugate()) / n
C = (A.dot(A.T.conjugate()) - centering) / (n - 1)
# The correlation coefficients are given by
# C_{i,j} / sqrt(C_{i} * C_{j})
d = np.diag(C)
coeffs = C / np.sqrt(np.outer(d, d))
return coeffs
Run Code Online (Sandbox Code Playgroud)
检查它是否正常工作:
# some smallish sparse random matrices
a = sparse.rand(100, 100000, density=0.1, format='csr')
b = sparse.rand(100, 100000, density=0.1, format='csr')
coeffs1 = sparse_corrcoef(a, b)
coeffs2 = np.corrcoef(a.todense(), b.todense())
print(np.allclose(coeffs1, coeffs2))
# True
Run Code Online (Sandbox Code Playgroud)
计算协方差矩阵所需的存储量C将在很大程度上取决于稀疏结构A(B如果给定).例如,如果A是(m, n)仅包含单列非零值C的(n, n)矩阵,那么将是包含所有非零值的矩阵.如果n大,那么就内存消耗而言,这可能是非常坏的消息.
只是使用numpy:
import numpy as np
C=((A.T*A -(sum(A).T*sum(A)/N))/(N-1)).todense()
V=np.sqrt(np.mat(np.diag(C)).T*np.mat(np.diag(C)))
COV = np.divide(C,V+1e-119)
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
5966 次 |
| 最近记录: |