使用Numba进行Numpy优化

use*_*496 7 python numpy numba

我在球体上有两组点,在下面的代码示例中标记为'obj'和'ps'.我想确定所有'obj'点比'ps'点的某个角距离更近.

我对此的看法是用3D单位向量表示每个点,并将它们的点积与cos(最大间距)进行比较.这可以通过numpy广播轻松完成,但在我的应用程序中我有n_obj~500,000和n_ps~50,000,因此广播的内存要求太大.下面我使用numba粘贴了我目前的拍摄.这可以进一步优化吗?

from numba import jit
import numpy as np
from sklearn.preprocessing import normalize

def gen_points(n):
    """
    generate random 3D unit vectors (not uniform, but irrelevant here)
    """
    vec = 2*np.random.rand(n,3)-1.
    vec_norm = normalize(vec)
    return vec_norm

#@jit(nopython=True)
@jit
def angdist_threshold_numba(vec_obj,vec_ps,cos_maxsep):
    """
    finds obj that are closer than maxsep to a ps
    """    
    nps = len(vec_ps)
    nobj = len(vec_obj)     

    #closeobj_all = []
    closeobj_all = np.empty(0)
    dotprod = np.empty(nobj)
    a = np.arange(nobj)
    for ps in range(nps):
        np.sum(vec_obj*vec_ps[ps],axis=1,out=dotprod)
        #closeobj_all.extend(a[dotprod > cos_maxsep])
        closeobj_all = np.append(closeobj_all, a[dotprod > cos_maxsep])  

    return closeobj_all


vec_obj = gen_points(50000) #in reality ~500,000
vec_ps = gen_points(5000) #in reality ~50,000
cos_maxsep = np.cos(0.003)

closeobj_all = np.unique(angdist_threshold_numba(vec_obj,vec_ps,cos_maxsep))
Run Code Online (Sandbox Code Playgroud)

这是使用代码中给出的测试用例的性能:

%timeit np.unique(angdist_threshold_numba(vec_obj,vec_ps,cos_maxsep))
1 loops, best of 3: 4.53 s per loop
Run Code Online (Sandbox Code Playgroud)

我试图加快使用速度

@jit(nopython=True)
Run Code Online (Sandbox Code Playgroud)

但这失败了

NotImplementedError: Failed at nopython (nopython frontend)
(<class 'numba.ir.Expr'>, build_list(items=[]))
Run Code Online (Sandbox Code Playgroud)

编辑:在numba更新到0.26之后,即使在python模式下,空列表的创建也会失败.这可以通过用np.empty(0)替换它来修复,而.extend()用np.append()替换,见上文.这几乎不会改变性能.

根据https://github.com/numba/numba/issues/858 np.empty()现在在nopython模式下受支持,但我仍然无法使用@jit(nopython = True)运行它:

TypingError: Internal error at <numba.typeinfer.CallConstraint object at 0x7ff3114a9310>
Run Code Online (Sandbox Code Playgroud)

小智 8

不像list.append你应该永远不要打电话numpy.append!这是因为即使附加单个元素,也需要复制整个数组.因为您只对唯一感兴趣,所以obj您可以使用布尔数组来标记到目前为止找到的匹配项.

至于Numba,如果你写出所有的循环,它最有效.例如:

@jit(nopython=True)
def numba2(vec_obj, vec_ps, cos_maxsep):
    nps = vec_ps.shape[0]
    nobj = vec_obj.shape[0]
    dim = vec_obj.shape[1]
    found = np.zeros(nobj, np.bool_)
    for i in range(nobj):
        for j in range(nps):
            cos = 0.0
            for k in range(dim):
                cos += vec_obj[i,k] * vec_ps[j,k]
            if cos > cos_maxsep:
                found[i] = True
                break
    return found.nonzero()
Run Code Online (Sandbox Code Playgroud)

额外的好处是,ps一旦我们找到与当前匹配的数据,我们就可以突破数组上的循环obj.

通过专门针对三维空间的功能,您可以获得更快的速度.此外,由于某种原因,将所有数组和相关维度传递给辅助函数会导致另一个加速:

def numba3(vec_obj, vec_ps, cos_maxsep):
    nps = len(vec_ps)
    nobj = len(vec_obj)
    out = np.zeros(nobj, bool)
    numba3_helper(vec_obj, vec_ps, cos_maxsep, out, nps, nobj)
    return np.flatnonzero(out)

@jit(nopython=True)
def numba3_helper(vec_obj, vec_ps, cos_maxsep, out, nps, nobj):
    for i in range(nobj):
        for j in range(nps):
            cos = (vec_obj[i,0]*vec_ps[j,0] + 
                   vec_obj[i,1]*vec_ps[j,1] + 
                   vec_obj[i,2]*vec_ps[j,2])
            if cos > cos_maxsep:
                out[i] = True
                break
    return out
Run Code Online (Sandbox Code Playgroud)

我获得20,000 obj和2,000的时间ps:

%timeit angdist_threshold_numba(vec_obj,vec_ps,cos_maxsep)
1 loop, best of 3: 2.99 s per loop
%timeit numba2(vec_obj, vec_ps, cos_maxsep)
1 loop, best of 3: 444 ms per loop
%timeit numba3(vec_obj, vec_ps, cos_maxsep)
10 loops, best of 3: 134 ms per loop
Run Code Online (Sandbox Code Playgroud)