扩展(添加行或列)scipy.sparse矩阵

Question

扩展(添加行或列)scipy.sparse矩阵

假设我有一个来自scipy.sparse的NxN矩阵M(lil_matrix或csr_matrix),我想让它(N + 1)xN,其中M_modified [i,j] = M [i,j]为0 <= i <N (和所有j)和所有j的M [N,j] = 0.基本上,我想在M的底部添加一行零并保留矩阵的其余部分.有没有办法在不复制数据的情况下执行此操作？

Answer 1

Jak*_*keM 26

Scipy没有办法在不复制数据的情况下执行此操作,但您可以通过更改定义稀疏矩阵的属性来自行完成.

组成csr_matrix有4个属性:

data:包含矩阵中实际值的数组

indices:包含与数据中每个值对应的列索引的数组

indptr:一个数组,指定每行数据中第一个值之前的索引.如果该行为空,则索引与前一列相同.

形状:包含矩阵形状的元组

如果您只是在底部添加一行零,那么您只需更改矩阵的形状和indptr即可.

x = np.ones((3,5))
x = csr_matrix(x)
x.toarray()
>> array([[ 1.,  1.,  1.,  1.,  1.],
          [ 1.,  1.,  1.,  1.,  1.],
          [ 1.,  1.,  1.,  1.,  1.]])
# reshape is not implemented for csr_matrix but you can cheat and do it  yourself.
x._shape = (4,5)
# Update indptr to let it know we added a row with nothing in it. So just append the last
# value in indptr to the end.
# note that you are still copying the indptr array
x.indptr = np.hstack((x.indptr,x.indptr[-1]))
x.toarray()
array([[ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.],
       [ 0.,  0.,  0.,  0.,  0.]])

Run Code Online (Sandbox Code Playgroud)

这是一个处理任何2 csr_matrices vstacking的更一般情况的函数.你仍然最终复制底层的numpy数组,但它仍然比scipy vstack方法快得多.

def csr_vappend(a,b):
    """ Takes in 2 csr_matrices and appends the second one to the bottom of the first one. 
    Much faster than scipy.sparse.vstack but assumes the type to be csr and overwrites
    the first matrix instead of copying it. The data, indices, and indptr still get copied."""

    a.data = np.hstack((a.data,b.data))
    a.indices = np.hstack((a.indices,b.indices))
    a.indptr = np.hstack((a.indptr,(b.indptr + a.nnz)[1:]))
    a._shape = (a.shape[0]+b.shape[0],b.shape[1])
    return a

Run Code Online (Sandbox Code Playgroud)

Answer 2

Sid*_*ant 8

不知道如果你还在寻找一个解决方案,但也许其他人可以考虑hstack与vstack- http://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.hstack.html.我认为我们可以为单个附加行定义一个csr_matrix,然后vstack使用前一个矩阵定义它.

[vstack 的源代码](https://github.com/scipy/scipy/blob/v0.19.0/scipy/sparse/construct.py#L461-L492) 正如这意味着，它返回输入矩阵的新副本，因此如果我们想**就地**扩展矩阵，效率不够高。 (2认同)

Answer 3

Jus*_*eel 6

我认为没有办法真正逃避复制.这两种类型的稀疏矩阵都将其数据存储为Numpy数组(在csr的数据和索引属性中以及lil的数据和行属性中),并且Numpy数组无法扩展.

更新更多信息:

LIL确实代表LInked List,但目前的实现并不符合这个名称.Numpy数组用于data并且rows都是类型对象.这些数组中的每个对象实际上都是Python列表(当所有值连续为零时为空列表).Python列表并不完全是链接列表,但由于O(1)查找,它们非常接近并且坦率地说是更好的选择.就个人而言,我没有立即看到在这里使用Numpy对象数组而不仅仅是Python列表.您可以相当容易地将当前的lil实现更改为使用Python列表,这样您就可以在不复制整个矩阵的情况下添加行.

归档时间：	15 年前
查看次数：	20474 次
最近记录：	12 年，7 月前