npi*_*nto 10 python optimization numpy cython scipy
我正在尝试使用并加速花式索引以"连接"两个数组并对结果'轴之一求和.
像这样的东西:
$ ipython
In [1]: import numpy as np
In [2]: ne, ds = 12, 6
In [3]: i = np.random.randn(ne, ds).astype('float32')
In [4]: t = np.random.randint(0, ds, size=(1e5, ne)).astype('uint8')
In [5]: %timeit i[np.arange(ne), t].sum(-1)
10 loops, best of 3: 44 ms per loop
Run Code Online (Sandbox Code Playgroud)
是否有一种简单的方法来加速声明In [5]?我应该去使用OpenMP和类似scipy.weave或Cython的prange?
numpy.take由于某种原因,它比花哨的索引要快得多.唯一的技巧是它将阵列视为扁平的.
In [1]: a = np.random.randn(12,6).astype(np.float32)
In [2]: c = np.random.randint(0,6,size=(1e5,12)).astype(np.uint8)
In [3]: r = np.arange(12)
In [4]: %timeit a[r,c].sum(-1)
10 loops, best of 3: 46.7 ms per loop
In [5]: rr, cc = np.broadcast_arrays(r,c)
In [6]: flat_index = rr*a.shape[1] + cc
In [7]: %timeit a.take(flat_index).sum(-1)
100 loops, best of 3: 5.5 ms per loop
In [8]: (a.take(flat_index).sum(-1) == a[r,c].sum(-1)).all()
Out[8]: True
Run Code Online (Sandbox Code Playgroud)
我认为除此之外你唯一能看到速度改进的另一种方法是使用类似PyCUDA的东西为GPU编写自定义内核.