Con*_*Hui 2 python numpy scipy
基本上,我只是想做一个简单的矩阵乘法,具体来说,提取它的每一列,并通过将其除以其长度来规范化它.
#csc sparse matrix
self.__WeightMatrix__ = self.__WeightMatrix__.tocsc()
#iterate through columns
for Col in xrange(self.__WeightMatrix__.shape[1]):
Column = self.__WeightMatrix__[:,Col].data
List = [x**2 for x in Column]
#get the column length
Len = math.sqrt(sum(List))
#here I assumed dot(number,Column) would do a basic scalar product
dot((1/Len),Column)
#now what? how do I update the original column of the matrix, everything that have been returned are copies, which drove me nuts and missed pointers so much
Run Code Online (Sandbox Code Playgroud)
我搜索了scipy稀疏矩阵文档,但没有得到有用的信息.我希望函数返回指向矩阵的指针/引用,以便我可以直接修改它的值.谢谢
在CSC格式你有两个可写的属性,data并且indices,持有你的矩阵和相应的行索引的非零项.您可以使用以下优势:
def sparse_row_normalize(sps_mat) :
if sps_mat.format != 'csc' :
msg = 'Can only row-normalize in place with csc format, not {0}.'
msg = msg.format(sps_mat.format)
raise ValueError(msg)
row_norm = np.sqrt(np.bincount(sps_mat.indices, weights=mat.data * mat_data))
sps_mat.data /= np.take(row_norm, sps_mat.indices)
Run Code Online (Sandbox Code Playgroud)
要看到它确实有效:
>>> mat = scipy.sparse.rand(4, 4, density=0.5, format='csc')
>>> mat.toarray()
array([[ 0. , 0. , 0.58931687, 0.31070526],
[ 0.24024639, 0.02767106, 0.22635696, 0.85971295],
[ 0. , 0. , 0.13613897, 0. ],
[ 0. , 0.13766507, 0. , 0. ]])
>>> mat.toarray() / np.sqrt(np.sum(mat.toarray()**2, axis=1))[:, None]
array([[ 0. , 0. , 0.88458487, 0.46637926],
[ 0.26076366, 0.03003419, 0.24568806, 0.93313324],
[ 0. , 0. , 1. , 0. ],
[ 0. , 1. , 0. , 0. ]])
>>> sparse_row_normalize(mat)
>>> mat.toarray()
array([[ 0. , 0. , 0.88458487, 0.46637926],
[ 0.26076366, 0.03003419, 0.24568806, 0.93313324],
[ 0. , 0. , 1. , 0. ],
[ 0. , 1. , 0. , 0. ]])
Run Code Online (Sandbox Code Playgroud)
它也快速,没有Python循环破坏乐趣:
In [2]: mat = scipy.sparse.rand(10000, 10000, density=0.005, format='csc')
In [3]: mat
Out[3]:
<10000x10000 sparse matrix of type '<type 'numpy.float64'>'
with 500000 stored elements in Compressed Sparse Column format>
In [4]: %timeit sparse_row_normalize(mat)
100 loops, best of 3: 14.1 ms per loop
Run Code Online (Sandbox Code Playgroud)