numba什么时候有效？

Question

numba什么时候有效？

Mr.*_*Who 3 python performance python-3.x numba

我知道 numba 会产生一些开销，并且在某些情况下（非密集计算）它变得比纯 python 慢。但我不知道在哪里划线。是否可以使用算法复杂度的顺序来找出位置？

例如在这段代码中添加两个比 5 更短的数组 (~O(n)) 纯 python 更快：

def sum_1(a,b):
    result = 0.0
    for i,j in zip(a,b):
            result += (i+j)
    return result

@numba.jit('float64[:](float64[:],float64[:])')
def sum_2(a,b):
    result = 0.0
    for i,j in zip(a,b):
            result += (i+j)
    return result

# try 100
a = np.linspace(1.0,2.0,5)
b = np.linspace(1.0,2.0,5)
print("pure python: ")
%timeit -o sum_1(a,b)
print("\n\n\n\npython + numba: ")
%timeit -o sum_2(a,b)

Run Code Online (Sandbox Code Playgroud)

UPDADE：我所寻找的是一个类似的指导方针喜欢这里：

“一般准则是为不同的数据大小和算法选择不同的目标。“cpu”目标适用于小数据大小（大约小于 1KB）和低计算强度的算法。它具有最少的开销。“ “并行”目标适用于中等数据大小（约小于 1MB）。线程会增加小延迟。“cuda”目标适用于大数据大小（约大于 1MB）和高计算强度算法。将内存传输到并且来自 GPU 的开销显着增加。”

Answer 1

MSe*_*ert 11

当 numba 生效时，很难划清界限。但是，有一些指标可能无效：

如果您不能使用jitwith nopython=True- 每当您无法在 nopython 模式下编译它时，您要么尝试编译太多，要么不会明显更快。
如果您不使用数组 - 当您处理传递给 numba 函数的列表或其他类型（其他 numba 函数除外）时，numba 需要复制这些，这会产生大量开销。
如果已经有一个 NumPy 或 SciPy 函数可以做到这一点 - 即使 numba 对于短数组可以明显更快，它对于较长数组几乎总是一样快（你也可能很容易忽略这些将处理的一些常见边缘情况）。

在 numba 比其他解决方案快“一点”的情况下，您可能不想使用 numba 的另一个原因是：必须编译 Numba 函数，无论是提前还是第一次调用时，在某些情况下，编译将比你的收益慢得多，即使你调用它数百次。此外，编译时间加起来：numba 的导入速度很慢，编译 numba 函数也增加了一些开销。如果导入开销增加了 1-10 秒，那么削减几毫秒是没有意义的。

numba 安装起来也很复杂（至少没有 conda），所以如果你想共享你的代码，那么你有一个真正的“重度依赖”。

您的示例缺乏与 NumPy 方法和高度优化的纯 Python 版本的比较。我添加了更多比较函数并进行了基准测试（使用我的库simple_benchmark）：

import numpy as np
import numba as nb
from itertools import chain

def python_loop(a,b):
    result = 0.0
    for i,j in zip(a,b):
        result += (i+j)
    return result

@nb.njit
def numba_loop(a,b):
    result = 0.0
    for i,j in zip(a,b):
            result += (i+j)
    return result

def numpy_methods(a, b):
    return a.sum() + b.sum()

def python_sum(a, b):
    return sum(chain(a.tolist(), b.tolist()))

from simple_benchmark import benchmark, MultiArgument

arguments = {
    2**i: MultiArgument([np.zeros(2**i), np.zeros(2**i)])
    for i in range(2, 17)
}
b = benchmark([python_loop, numba_loop, numpy_methods, python_sum], arguments, warmups=[numba_loop])

%matplotlib notebook
b.plot()

Run Code Online (Sandbox Code Playgroud)

是的，numba 函数对于小数组是最快的，但是对于较长的数组，NumPy 解决方案会稍微快一些。Python 解决方案较慢，但“更快”的替代方案已经比您最初提出的解决方案快得多。

在这种情况下，我将简单地使用 NumPy 解决方案，因为它简短、可读且快速，除非您处理大量短数组并多次调用该函数 - 那么 numba 解决方案会明显更好。

Answer 2

max*_*111 6

如果您不确切知道显式输入和输出声明的结果是什么，请让 numba 来决定。根据您的输入，您可能想要使用'float64(float64[::1],float64[::1])'. （标量输出，连续输入数组）。如果您使用跨步输入调用显式声明的函数，它将失败，如果您让 Numba 完成这项工作，它只会重新编译。如果不使用fastmath=TrueSIMD，也无法使用SIMD，因为它会改变结果的精度。

计算至少 4 个部分和（256 位向量）并比计算这些部分和的总和在这里更好（Numpy 也不计算朴素和）。

使用 MSeiferts 优秀基准实用程序的示例

import numpy as np
import numba as nb
from itertools import chain

def python_loop(a,b):
    result = 0.0
    for i,j in zip(a,b):
        result += (i+j)
    return result

@nb.njit
def numba_loop_zip(a,b):
    result = 0.0
    for i,j in zip(a,b):
            result += (i+j)
    return result

#Your version with suboptimal input and output (prevent njit compilation) declaration
@nb.jit('float64[:](float64[:],float64[:])')
def numba_your_func(a,b):
    result = 0.0
    for i,j in zip(a,b):
            result += (i+j)
    return result

@nb.njit(fastmath=True)
def numba_loop_zip_fastmath(a,b):
    result = 0.0
    for i,j in zip(a,b):
            result += (i+j)
    return result

@nb.njit(fastmath=True)
def numba_loop_fastmath_single(a,b):
    result = 0.0
    size=min(a.shape[0],b.shape[0])
    for i in range(size):
            result += a[i]+b[i]
    return result

@nb.njit(fastmath=True,parallel=True)
def numba_loop_fastmath_multi(a,b):
    result = 0.0
    size=min(a.shape[0],b.shape[0])
    for i in nb.prange(size):
            result += a[i]+b[i]
    return result

#just for fun... single-threaded for small arrays,
#multithreaded for larger arrays
@nb.njit(fastmath=True,parallel=True)
def numba_loop_fastmath_combined(a,b):
    result = 0.0
    size=min(a.shape[0],b.shape[0])
    if size>2*10**4:
        result=numba_loop_fastmath_multi(a,b)
    else:
        result=numba_loop_fastmath_single(a,b)
    return result

def numpy_methods(a, b):
    return a.sum() + b.sum()

def python_sum(a, b):
    return sum(chain(a.tolist(), b.tolist()))

from simple_benchmark import benchmark, MultiArgument

arguments = {
    2**i: MultiArgument([np.zeros(2**i), np.zeros(2**i)])
    for i in range(2, 19)
}
b = benchmark([python_loop, numba_loop_zip, numpy_methods,numba_your_func, python_sum,numba_loop_zip_fastmath,numba_loop_fastmath_single,numba_loop_fastmath_multi,numba_loop_fastmath_combined], arguments, warmups=[numba_loop_zip,numba_loop_zip_fastmath,numba_your_func,numba_loop_fastmath_single,numba_loop_fastmath_multi,numba_loop_fastmath_combined])

%matplotlib notebook
b.plot()

Run Code Online (Sandbox Code Playgroud)

请注意，仅在某些特殊情况下才建议使用numba_loop_fastmath_multi或。numba_loop_fastmath_combined(a,b)通常，这样一个简单的函数是另一个问题的一部分，可以更有效地并行化（启动线程有一些开销）

归档时间：	6 年，8 月前
查看次数：	3296 次
最近记录：	6 年，8 月前