ham*_*els 11 python numpy sum scipy cumsum
假设我有一个scipy.sparse.csr_matrix代表下面的值
[[0 0 1 2 0 3 0 4]
[1 0 0 2 0 3 4 0]]
Run Code Online (Sandbox Code Playgroud)
我想计算就地非零值的累积和,这会将数组更改为:
[[0 0 1 3 0 6 0 10]
[1 0 0 3 0 6 10 0]]
Run Code Online (Sandbox Code Playgroud)
实际值不是1,2,3 ......
每行中的非零值的数量不太可能相同.
怎么这么快?
目前的计划:
import scipy.sparse
import numpy as np
# sparse data
a = scipy.sparse.csr_matrix(
[[0,0,1,2,0,3,0,4],
[1,0,0,2,0,3,4,0]],
dtype=int)
# method
indptr = a.indptr
data = a.data
for i in range(a.shape[0]):
st = indptr[i]
en = indptr[i + 1]
np.cumsum(data[st:en], out=data[st:en])
# print result
print(a.todense())
Run Code Online (Sandbox Code Playgroud)
结果:
[[ 0 0 1 3 0 6 0 10]
[ 1 0 0 3 0 6 10 0]]
Run Code Online (Sandbox Code Playgroud)
这样做怎么样
\n\na = np.array([[0,0,1,2,0,3,0,4],\n [1,0,0,2,0,3,4,0]], dtype=int)\n\nb = a.copy()\nb[b > 0] = 1\nz = np.cumsum(a,axis=1)\nprint(z*b)\nRun Code Online (Sandbox Code Playgroud)\n\n产量
\n\narray([[ 0, 0, 1, 3, 0, 6, 0, 10],\n [ 1, 0, 0, 3, 0, 6, 10, 0]])\nRun Code Online (Sandbox Code Playgroud)\n\n做稀疏
\n\ndef sparse(a):\n a = scipy.sparse.csr_matrix(a)\n\n indptr = a.indptr\n data = a.data\n for i in range(a.shape[0]):\n st = indptr[i]\n en = indptr[i + 1]\n np.cumsum(data[st:en], out=data[st:en])\n\n\nIn[1]: %timeit sparse(a)\n10000 loops, best of 3: 167 \xc2\xb5s per loop\nRun Code Online (Sandbox Code Playgroud)\n\n使用乘法
\n\ndef mult(a):\n b = a.copy()\n b[b > 0] = 1\n z = np.cumsum(a, axis=1)\n z * b\n\nIn[2]: %timeit mult(a)\n100000 loops, best of 3: 5.93 \xc2\xb5s per loop\nRun Code Online (Sandbox Code Playgroud)\n