scipy.sparse.hstack(([1],[2])) - >"ValueError:blocks必须是2-D".为什么?

Fra*_*urt 5 python scipy sparse-matrix

scipy.sparse.hstack((1, [2]))并且scipy.sparse.hstack((1, [2]))工作得很好,但不是scipy.sparse.hstack(([1], [2])).为什么会这样?

以下是我系统上发生的情况:


C:\Anaconda>python
Python 2.7.10 |Anaconda 2.3.0 (64-bit)| (default, May 28 2015, 16:44:52) [MSC v.
1500 64 bit (AMD64)] on win32
>>> import scipy.sparse
>>> scipy.sparse.hstack((1, [2]))
<1x2 sparse matrix of type '<type 'numpy.int32'>'
        with 2 stored elements in COOrdinate format>
>>> scipy.sparse.hstack((1, 2))
<1x2 sparse matrix of type '<type 'numpy.int32'>'
        with 2 stored elements in COOrdinate format>
>>> scipy.sparse.hstack(([1], [2]))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Anaconda\lib\site-packages\scipy\sparse\construct.py", line 456, in h
stack
    return bmat([blocks], format=format, dtype=dtype)
  File "C:\Anaconda\lib\site-packages\scipy\sparse\construct.py", line 539, in b
mat
    raise ValueError('blocks must be 2-D')
ValueError: blocks must be 2-D
>>> scipy.version.full_version
'0.16.0'
>>>
Run Code Online (Sandbox Code Playgroud)

ray*_*ica 7

在第一种情况下scipy.sparse.hstack((1, [2])),数字1被解释为标量值,数字2被解释为密集矩阵,因此当您将这两个事物组合在一起时,数据类型被强制执行,因此它们都是标量,您可以将它与scipy.sparse.hstack正常结合起来.

这里有一些测试表明多个值都是如此:

In [31]: scipy.sparse.hstack((1,2,[3],[4]))
Out[31]: 
<1x4 sparse matrix of type '<type 'numpy.int64'>'
    with 4 stored elements in COOrdinate format>

In [32]: scipy.sparse.hstack((1,2,[3],[4],5,6))
Out[32]: 
<1x6 sparse matrix of type '<type 'numpy.int64'>'
    with 6 stored elements in COOrdinate format>

In [33]: scipy.sparse.hstack((1,[2],[3],[4],5,[6],7))
Out[33]: 
<1x7 sparse matrix of type '<type 'numpy.int64'>'
Run Code Online (Sandbox Code Playgroud)

如您所见,如果您至少有一个标量存在hstack,这似乎有效.

但是,当你尝试第二种情况时scipy.sparse.hstack(([1],[2])),它们不再是两个标量,这些都是密集矩阵,你不能使用scipy.sparse.hstack纯密集矩阵.

重现:

In [34]: scipy.sparse.hstack(([1],[2]))
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-45-cd79952b2e14> in <module>()
----> 1 scipy.sparse.hstack(([1],[2]))

/usr/local/lib/python2.7/site-packages/scipy/sparse/construct.pyc in hstack(blocks, format, dtype)
    451 
    452     """
--> 453     return bmat([blocks], format=format, dtype=dtype)
    454 
    455 

/usr/local/lib/python2.7/site-packages/scipy/sparse/construct.pyc in bmat(blocks, format, dtype)
    531 
    532     if blocks.ndim != 2:
--> 533         raise ValueError('blocks must be 2-D')
    534 
    535     M,N = blocks.shape

ValueError: blocks must be 2-D
Run Code Online (Sandbox Code Playgroud)

有关更多信息,请参阅此文章:稀疏hstack的Scipy错误

因此,如果要在两个矩阵中成功使用它,则必须先将它们稀疏,然后将它们组合起来:

In [36]: A = scipy.sparse.coo_matrix([1])

In [37]: B = scipy.sparse.coo_matrix([2])

In [38]: C = scipy.sparse.hstack([A, B])

In [39]: C
Out[39]: 
<1x2 sparse matrix of type '<type 'numpy.int64'>'
    with 2 stored elements in COOrdinate format>
Run Code Online (Sandbox Code Playgroud)

有趣的是,如果你尝试用密集版本做的hstack,或者numpy.hstack,那么它是完全可以接受的:

In [48]: import numpy as np

In [49]: np.hstack((1, [2]))
Out[49]: array([1, 2])
Run Code Online (Sandbox Code Playgroud)

....为稀疏矩阵表示弄乱了¯\_(?)_/¯.


hpa*_*ulj 3

编码详细信息是:

def hstack(blocks ...):
    return bmat([blocks], ...)

def bmat(blocks, ...):
    blocks = np.asarray(blocks, dtype='object')
    if blocks.ndim != 2:
        raise ValueError('blocks must be 2-D')
    (continue)
Run Code Online (Sandbox Code Playgroud)

所以尝试你的替代方案(记住额外的[]):

In [392]: np.asarray([(1,2)],dtype=object)
Out[392]: array([[1, 2]], dtype=object)

In [393]: np.asarray([(1,[2])],dtype=object)
Out[393]: array([[1, [2]]], dtype=object)

In [394]: np.asarray([([1],[2])],dtype=object)
Out[394]: 
array([[[1],
        [2]]], dtype=object)

In [395]: _.shape
Out[395]: (1, 2, 1)
Run Code Online (Sandbox Code Playgroud)

最后一个案例(您的问题案例)失败了,因为结果是 3d。

有 2 个稀疏矩阵(预期输入):

In [402]: np.asarray([[a,a]], dtype=object) 
Out[402]: 
array([[ <1x1 sparse matrix of type '<class 'numpy.int32'>'
    with 1 stored elements in COOrdinate format>,
        <1x1 sparse matrix of type '<class 'numpy.int32'>'
    with 1 stored elements in COOrdinate format>]], dtype=object)

In [403]: _.shape
Out[403]: (1, 2)
Run Code Online (Sandbox Code Playgroud)

hstack正在利用该bmat格式,将矩阵列表转换为嵌套(二维)矩阵列表。 bmat是一种将稀疏矩阵的二维数组组合成一个更大的矩阵的方法。跳过首先制作这些稀疏矩阵的步骤可能有效,也可能无效。代码和文档不做出任何承诺。