标准化稀疏行概率矩阵

Question

标准化稀疏行概率矩阵

Pav*_*lin 0 python numpy scipy sparse-matrix

我有一个包含一些元素的稀疏矩阵。现在我想对其进行标准化。但是，当我这样做时，它会转换为 numpy 数组，从性能的角度来看，这是不可接受的。

为了使事情更具体，请考虑以下示例：

x = csr_matrix([[0, 1, 1], [2, 3, 0]])  # sparse
normalization = x.sum(axis=1)  # dense, this is OK

x / normalization  # this is dense, not OK, can be huge

Run Code Online (Sandbox Code Playgroud)

有没有一种优雅的方法可以做到这一点而不必诉诸 for 循环？

编辑

是的，这可以使用“l1”标准化来完成sklearn.preprocessing.normalize，但是，我不想依赖 sklearn。

Answer 1

Pau*_*zer 6

您始终可以使用csr内部：

>>> import numpy as np
>>> from scipy import sparse
>>> 
>>> x = sparse.csr_matrix([[0, 1, 1], [2, 3, 0]]) 
>>> 
>>> x.data = x.data / np.repeat(np.add.reduceat(x.data, x.indptr[:-1]), np.diff(x.indptr))
>>> x
<2x3 sparse matrix of type '<class 'numpy.float64'>'
        with 4 stored elements in Compressed Sparse Row format>
>>> x.A
array([[0. , 0.5, 0.5],
       [0.4, 0.6, 0. ]])

Run Code Online (Sandbox Code Playgroud)

归档时间：	7 年，10 月前
查看次数：	1073 次
最近记录：	7 年，10 月前