标准化Scipy稀疏矩阵的有效方法

ste*_*rne 23 python numpy scipy sparse-matrix

我想编写一个函数来规范化大型稀疏矩阵的行(使它们总和为1).

from pylab import *
import scipy.sparse as sp

def normalize(W):
    z = W.sum(0)
    z[z < 1e-6] = 1e-6
    return W / z[None,:]

w = (rand(10,10)<0.1)*rand(10,10)
w = sp.csr_matrix(w)
w = normalize(w)
Run Code Online (Sandbox Code Playgroud)

但是,这会产生以下异常:

File "/usr/lib/python2.6/dist-packages/scipy/sparse/base.py", line 325, in __div__
     return self.__truediv__(other)
File "/usr/lib/python2.6/dist-packages/scipy/sparse/compressed.py", line 230, in  __truediv__
   raise NotImplementedError
Run Code Online (Sandbox Code Playgroud)

有没有相当简单的解决方案?我看过这个,但我还不清楚如何实际进行分组.

Aar*_*aid 41

这已在scikit-learn sklearn.preprocessing.normalize中实现.

from sklearn.preprocessing import normalize
w_normalized = normalize(w, norm='l1', axis=1)
Run Code Online (Sandbox Code Playgroud)

axis=1应按行axis=0规范化,按列进行标准化.使用可选参数copy=False来修改矩阵.

  • 请注意,如果按特征(轴= 0)进行标准化,则返回的矩阵的类型为"csc",即使w为'csr'.如果你指望它是'csr',这可能会令人不愉快 (3认同)