abu*_*kaj 8 python performance numpy standard-library augmented-assignment
是否有与以下功能等效的标准库/ numpy:
def augmented_assignment_sum(iterable, start=0):
for n in iterable:
start += n
return start
Run Code Online (Sandbox Code Playgroud)
?
虽然sum(ITERABLE)非常优雅,但它使用+operator代替+=,如果出现np.ndarray对象,这可能会影响性能。
我已经测试过,我的功能可能和它一样快sum()(而使用它的等效+速度要慢得多)。由于它是纯Python函数,因此我认为它的性能仍然受到限制,因此我正在寻找一些替代方法:
In [49]: ARRAYS = [np.random.random((1000000)) for _ in range(100)]
In [50]: def not_augmented_assignment_sum(iterable, start=0):
...: for n in iterable:
...: start = start + n
...: return start
...:
In [51]: %timeit not_augmented_assignment_sum(ARRAYS)
63.6 ms ± 8.88 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [52]: %timeit sum(ARRAYS)
31.2 ms ± 2.18 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [53]: %timeit augmented_assignment_sum(ARRAYS)
31.2 ms ± 4.73 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [54]: %timeit not_augmented_assignment_sum(ARRAYS)
62.5 ms ± 12.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [55]: %timeit sum(ARRAYS)
37 ms ± 9.51 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [56]: %timeit augmented_assignment_sum(ARRAYS)
27.7 ms ± 2.53 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Run Code Online (Sandbox Code Playgroud)
我尝试将其functools.reduce与结合使用operator.iadd,但其性能相似:
In [79]: %timeit reduce(iadd, ARRAYS, 0)
33.4 ms ± 11.6 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [80]: %timeit reduce(iadd, ARRAYS, 0)
29.4 ms ± 2.31 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Run Code Online (Sandbox Code Playgroud)
我对内存效率也很感兴趣,因此更喜欢扩展分配,因为它们不需要创建中间对象。
标题问题的答案——我希望 @Martijn Pieters 能原谅我选择的隐喻——直接从马嘴里出来的是:不,不存在这样的内置函数。
如果我们允许几行代码来实现这样的等价物,我们会得到一个相当复杂的图片,其中最快的很大程度上取决于操作数的大小:
该图显示了不同方法相对于sum操作数大小过大的时序,项数始终为 100。augmented_assignment_sum开始向相对较大的操作数大小带来回报。在大多数测试范围内, Usingscipy.linalg.blas.*axpy看起来相当有竞争力,它的主要缺点是不如sum.
代码:
from simple_benchmark import BenchmarkBuilder, MultiArgument
import numpy as np
from scipy.linalg import blas
B = BenchmarkBuilder()
@B.add_function()
def augmented_assignment_sum(iterable, start=0):
for n in iterable:
start += n
return start
@B.add_function()
def not_augmented_assignment_sum(iterable, start=0):
for n in iterable:
start = start + n
return start
@B.add_function()
def plain_sum(iterable, start=0):
return sum(iterable,start)
@B.add_function()
def blas_sum(iterable, start=None):
iterable = iter(iterable)
if start is None:
try:
start = next(iterable).copy()
except StopIteration:
return 0
try:
f = {np.dtype('float32'):blas.saxpy,
np.dtype('float64'):blas.daxpy,
np.dtype('complex64'):blas.caxpy,
np.dtype('complex128'):blas.zaxpy}[start.dtype]
except KeyError:
f = blas.daxpy
start = start.astype(float)
for n in iterable:
f(n,start)
return start
@B.add_arguments('size of terms')
def argument_provider():
for exp in range(1,21):
sz = int(2**exp)
yield sz,[np.random.randn(sz) for _ in range(100)]
r = B.run()
r.plot(relative_to=plain_sum)
import pylab
pylab.savefig('inplacesum.png')
Run Code Online (Sandbox Code Playgroud)