numpy csr 矩阵“mean”函数是否对所有矩阵求平均值？如何删除某个值？

Question

numpy csr 矩阵“mean”函数是否对所有矩阵求平均值？如何删除某个值？

我有一个 numpy csr 矩阵，我想得到它的平均值，但它包含很多零，因为我消除了主对角线上及其下方的所有值，仅取上三角形值，现在转换时我的 csr 矩阵数组看起来像这样：

   0.          0.          0.          0.          0.          0.          0.
   0.          0.          0.          0.          0.          0.          0.
   0.          0.          0.          0.          0.63646664  0.34827262
   0.24316454  0.1362165   0.63646664  0.15762204  0.31692202  0.12114576
   0.35917146

Run Code Online (Sandbox Code Playgroud)

据我所知，为了让 csr 矩阵工作并显示如下内容，零点很重要：

(0,5) 0.5790418
(3,10) 0.578210
(5,20) 0.912370
(67,5) 0.1093109

Run Code Online (Sandbox Code Playgroud)

我看到 csr 矩阵有它自己的均值函数，但是这个均值函数是否考虑了所有零，因此除以数组中包括零的元素数量？因为我只需要非零值的平均值。我的矩阵包含多个向量之间的相似性，更像是一个矩阵列表：

[[ 0.          0.63646664  0.48492084  0.42134077  0.14366401  0.10909745
   0.06172853  0.08116201  0.19100626  0.14517247  0.23814955  0.1899649
   0.20181049  0.25663533  0.21003358  0.10436352  0.2038447   1.
   0.63646664  0.34827262  0.24316454  0.1362165   0.63646664  0.15762204
   0.31692202  0.12114576  0.35917146]
 [ 0.          0.          0.58644824  0.4977052   0.15953415  0.46110612
   0.42580993  0.3236768   0.48874263  0.44671607  0.59153001  0.57868948
   0.27357541  0.51645488  0.43317846  0.50985032  0.37317457  0.63646664
   1.          0.51529235  0.56963948  0.51218525  1.          0.38345582
   0.55396192  0.32287605  0.46700191]
 [ 0.          0.          0.          0.6089113   0.53873289  0.3367261
   0.29264493  0.13232082  0.43288206  0.80079927  0.37842518  0.33658945
   0.61990095  0.54372307  0.49982101  0.23555037  0.39283379  0.48492084
   0.58644824  0.64524906  0.31279271  0.39476181  0.58644824  0.39028705
   0.43856802  0.32296735  0.5541861 ]]

Run Code Online (Sandbox Code Playgroud)

那么我怎样才能只取非零值的平均值呢？

我的另一个问题是如何删除所有等于某值的值，正如我上面指出的那样，我可能必须将某个值变为零？但是我该怎么做呢？例如，我想删除所有等于 1.0 或更大的值？这是我迄今为止制作矩阵的代码：

vectorized_words = parse.csr_matrix(vectorize_words(nostopwords,glove_dict))

#calculating the distance/similarity between each vector in the matrix
cos_similiarity = cosine_similarity(vectorized_words, dense_output=False)
# since there are duplicates like (5,0) and (0,5) which we should remove, I use scipy's triu function
coo_cossim = cos_similiarity.tocoo()
vector_similarities = sparse.triu(coo_cossim, k = 1).tocsr()

Run Code Online (Sandbox Code Playgroud)

Answer 1

Jam*_*mes 5

是的，csr_matrix.mean()计算平均值时确实包括所有零。作为一个简单的例子：

from scipy.sparse import csr_matrix

m = csr_matrix(([1,1], ([2,3],[3,3])), shape=(5,5))
m.toarray()

# returns:
array([[0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 1, 0],
       [0, 0, 0, 1, 0],
       [0, 0, 0, 0, 0]], dtype=int32)

# test the mean method
m.mean(), m.mean(axis=0), m.mean(axis=1)

# returns:
0.080000000000000002,
matrix([[ 0. ,  0. ,  0. ,  0.4,  0. ]]),
matrix([[ 0. ],
        [ 0. ],
        [ 0.2],
        [ 0.2],
        [ 0. ]])

Run Code Online (Sandbox Code Playgroud)

如果您需要执行不包含零的计算，则必须使用其他方法构建结果。但这并不难做到：

nonzero_mean = m.sum() / m.count_nonzero()

Run Code Online (Sandbox Code Playgroud)

归档时间：	8 年，10 月前
查看次数：	1626 次
最近记录：	8 年，10 月前