Jos*_*ngs 6 python arrays numpy vectorization
我想像numpy.arange(0, cnt_i)一个cnt值向量一样对调用进行向量化,并像这个片段一样连接结果:
import numpy
cnts = [1,2,3]
numpy.concatenate([numpy.arange(cnt) for cnt in cnts])
array([0, 0, 1, 0, 1, 2])
Run Code Online (Sandbox Code Playgroud)
不幸的是,由于临时数组和列表推导循环,上面的代码非常低效.
有没有办法在numpy中更有效地做到这一点?
这是一个完全矢量化的函数:
def multirange(counts):
counts = np.asarray(counts)
# Remove the following line if counts is always strictly positive.
counts = counts[counts != 0]
counts1 = counts[:-1]
reset_index = np.cumsum(counts1)
incr = np.ones(counts.sum(), dtype=int)
incr[0] = 0
incr[reset_index] = 1 - counts1
# Reuse the incr array for the final result.
incr.cumsum(out=incr)
return incr
Run Code Online (Sandbox Code Playgroud)
这是@Developer 答案的一个变体,它只调用arange一次:
def multirange_loop(counts):
counts = np.asarray(counts)
ranges = np.empty(counts.sum(), dtype=int)
seq = np.arange(counts.max())
starts = np.zeros(len(counts), dtype=int)
starts[1:] = np.cumsum(counts[:-1])
for start, count in zip(starts, counts):
ranges[start:start + count] = seq[:count]
return ranges
Run Code Online (Sandbox Code Playgroud)
这是原始版本,编写为函数:
def multirange_original(counts):
ranges = np.concatenate([np.arange(count) for count in counts])
return ranges
Run Code Online (Sandbox Code Playgroud)
演示:
In [296]: multirange_original([1,2,3])
Out[296]: array([0, 0, 1, 0, 1, 2])
In [297]: multirange_loop([1,2,3])
Out[297]: array([0, 0, 1, 0, 1, 2])
In [298]: multirange([1,2,3])
Out[298]: array([0, 0, 1, 0, 1, 2])
Run Code Online (Sandbox Code Playgroud)
使用更大的计数数组比较时序:
In [299]: counts = np.random.randint(1, 50, size=50)
In [300]: %timeit multirange_original(counts)
10000 loops, best of 3: 114 µs per loop
In [301]: %timeit multirange_loop(counts)
10000 loops, best of 3: 76.2 µs per loop
In [302]: %timeit multirange(counts)
10000 loops, best of 3: 26.4 µs per loop
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
803 次 |
| 最近记录: |