通过scipy.sparse向量(或矩阵)迭代

Ran*_*Guy 41 python scipy sparse-matrix

我想知道用scipy.sparse迭代稀疏矩阵的非零项最好的方法是什么.例如,如果我执行以下操作:

from scipy.sparse import lil_matrix

x = lil_matrix( (20,1) )
x[13,0] = 1
x[15,0] = 2

c = 0
for i in x:
  print c, i
  c = c+1
Run Code Online (Sandbox Code Playgroud)

输出是

0 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13   (0, 0) 1.0
14 
15   (0, 0) 2.0
16 
17 
18 
19  
Run Code Online (Sandbox Code Playgroud)

因此看起来迭代器正在触及每个元素,而不仅仅是非零条目.我已经看过API了

http://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.lil_matrix.html

并搜索了一下,但我似乎无法找到一个有效的解决方案.

unu*_*tbu 61

编辑:bbtrb的方法(使用coo_matrix)比我原来的建议快得多,使用非零.Sven Marnach的使用建议itertools.izip也提高了速度.目前最快的是using_tocoo_izip:

import scipy.sparse
import random
import itertools

def using_nonzero(x):
    rows,cols = x.nonzero()
    for row,col in zip(rows,cols):
        ((row,col), x[row,col])

def using_coo(x):
    cx = scipy.sparse.coo_matrix(x)    
    for i,j,v in zip(cx.row, cx.col, cx.data):
        (i,j,v)

def using_tocoo(x):
    cx = x.tocoo()    
    for i,j,v in zip(cx.row, cx.col, cx.data):
        (i,j,v)

def using_tocoo_izip(x):
    cx = x.tocoo()    
    for i,j,v in itertools.izip(cx.row, cx.col, cx.data):
        (i,j,v)

N=200
x = scipy.sparse.lil_matrix( (N,N) )
for _ in xrange(N):
    x[random.randint(0,N-1),random.randint(0,N-1)]=random.randint(1,100)
Run Code Online (Sandbox Code Playgroud)

产生这些timeit结果:

% python -mtimeit -s'import test' 'test.using_tocoo_izip(test.x)'
1000 loops, best of 3: 670 usec per loop
% python -mtimeit -s'import test' 'test.using_tocoo(test.x)'
1000 loops, best of 3: 706 usec per loop
% python -mtimeit -s'import test' 'test.using_coo(test.x)'
1000 loops, best of 3: 802 usec per loop
% python -mtimeit -s'import test' 'test.using_nonzero(test.x)'
100 loops, best of 3: 5.25 msec per loop
Run Code Online (Sandbox Code Playgroud)

  • 这里的要点是使用tocoo,但我强烈建议不要迭代!所有x.row,x.col和x.data都是numpy数组,在大多数情况下都可以使用.简单的例子,将矩阵元素值设置为其索引的乘积:x.data [:] = x.col*x.row (3认同)
  • 对于python3,你可以使用内置的`zip`函数:[python2to3](http://www.diveintopython3.net/porting-code-to-python-3-with-2to3.html) (3认同)

bbt*_*trb 32

最快的方法应该是转换为coo_matrix:

cx = scipy.sparse.coo_matrix(x)

for i,j,v in zip(cx.row, cx.col, cx.data):
    print "(%d, %d), %s" % (i,j,v)
Run Code Online (Sandbox Code Playgroud)


zer*_*oth 5

要循环scipy.sparse代码部分中的各种稀疏矩阵,我将使用这个小包装函数(请注意,对于 Python-2,鼓励您使用xrangeizip在大型矩阵上获得更好的性能):

from scipy.sparse import *
def iter_spmatrix(matrix):
    """ Iterator for iterating the elements in a ``scipy.sparse.*_matrix`` 

    This will always return:
    >>> (row, column, matrix-element)

    Currently this can iterate `coo`, `csc`, `lil` and `csr`, others may easily be added.

    Parameters
    ----------
    matrix : ``scipy.sparse.sp_matrix``
      the sparse matrix to iterate non-zero elements
    """
    if isspmatrix_coo(matrix):
        for r, c, m in zip(matrix.row, matrix.col, matrix.data):
            yield r, c, m

    elif isspmatrix_csc(matrix):
        for c in range(matrix.shape[1]):
            for ind in range(matrix.indptr[c], matrix.indptr[c+1]):
                yield matrix.indices[ind], c, matrix.data[ind]

    elif isspmatrix_csr(matrix):
        for r in range(matrix.shape[0]):
            for ind in range(matrix.indptr[r], matrix.indptr[r+1]):
                yield r, matrix.indices[ind], matrix.data[ind]

    elif isspmatrix_lil(matrix):
        for r in range(matrix.shape[0]):
            for c, d in zip(matrix.rows[r], matrix.data[r]):
                yield r, c, d

    else:
        raise NotImplementedError("The iterator for this sparse matrix has not been implemented")
Run Code Online (Sandbox Code Playgroud)