Scipy Sparse Cumsum

ham*_*els 11 python numpy sum scipy cumsum

假设我有一个scipy.sparse.csr_matrix代表下面的值

[[0 0 1 2 0 3 0 4]
 [1 0 0 2 0 3 4 0]]
Run Code Online (Sandbox Code Playgroud)

我想计算就地非零值的累积和,这会将数组更改为:

[[0 0 1 3 0 6 0 10]
 [1 0 0 3 0 6 10 0]]
Run Code Online (Sandbox Code Playgroud)

实际值不是1,2,3 ......

每行中的非零值的数量不太可能相同.

怎么这么快?

目前的计划:

import scipy.sparse
import numpy as np

# sparse data
a = scipy.sparse.csr_matrix(
    [[0,0,1,2,0,3,0,4],
     [1,0,0,2,0,3,4,0]], 
    dtype=int)

# method
indptr = a.indptr
data = a.data
for i in range(a.shape[0]):
    st = indptr[i]
    en = indptr[i + 1]
    np.cumsum(data[st:en], out=data[st:en])

# print result
print(a.todense())
Run Code Online (Sandbox Code Playgroud)

结果:

[[ 0  0  1  3  0  6  0 10]
 [ 1  0  0  3  0  6 10  0]]
Run Code Online (Sandbox Code Playgroud)

DJK*_*DJK 2

这样做怎么样

\n\n
a = np.array([[0,0,1,2,0,3,0,4],\n              [1,0,0,2,0,3,4,0]], dtype=int)\n\nb = a.copy()\nb[b > 0] = 1\nz = np.cumsum(a,axis=1)\nprint(z*b)\n
Run Code Online (Sandbox Code Playgroud)\n\n

产量

\n\n
array([[ 0,  0,  1,  3,  0,  6,  0, 10],\n   [ 1,  0,  0,  3,  0,  6, 10,  0]])\n
Run Code Online (Sandbox Code Playgroud)\n\n

做稀疏

\n\n
def sparse(a):\n    a = scipy.sparse.csr_matrix(a)\n\n    indptr = a.indptr\n    data = a.data\n    for i in range(a.shape[0]):\n        st = indptr[i]\n        en = indptr[i + 1]\n        np.cumsum(data[st:en], out=data[st:en])\n\n\nIn[1]: %timeit sparse(a)\n10000 loops, best of 3: 167 \xc2\xb5s per loop\n
Run Code Online (Sandbox Code Playgroud)\n\n

使用乘法

\n\n
def mult(a):\n    b = a.copy()\n    b[b > 0] = 1\n    z = np.cumsum(a, axis=1)\n    z * b\n\nIn[2]: %timeit mult(a)\n100000 loops, best of 3: 5.93 \xc2\xb5s per loop\n
Run Code Online (Sandbox Code Playgroud)\n