np.tile 的 Numba 兼容实现?

Phi*_*ahn 3 python numpy numba

我正在根据本文编写一些用于图像去雾的代码,并从废弃的Py2.7 实现开始。从那时起,特别是使用 Numba,我取得了一些真正的性能改进(很重要,因为我必须在 8K 图像上运行它)。

我非常确信我最后一个重要的性能瓶颈是执行框过滤步骤(我已经为每个图像节省了几乎一分钟的时间,但这最后一个缓慢的步骤约为 30 秒/图像),而且我已经接近实现它像在 Numba 中一样运行nopython

@njit # Row dependencies means can't be parallel
def yCumSum(a):
    """
    Numba based computation of y-direction
    cumulative sum. Can't be parallel!
    """
    out = np.empty_like(a)
    out[0, :] = a[0, :]
    for i in prange(1, a.shape[0]):
        out[i, :] = a[i, :] + out[i - 1, :]
    return out

@njit(parallel= True)
def xCumSum(a):
    """
    Numba-based parallel computation
    of X-direction cumulative sum
    """
    out = np.empty_like(a)
    for i in prange(a.shape[0]):
        out[i, :] = np.cumsum(a[i, :])
    return out

@jit
def _boxFilter(m, r, gpu= hasGPU):
    if gpu:
        m = cp.asnumpy(m)
    out = __boxfilter__(m, r)
    if gpu:
        return cp.asarray(out)
    return out

@jit(fastmath= True)
def __boxfilter__(m, r):
    """
    Fast box filtering implementation, O(1) time.
    Parameters
    ----------
    m:  a 2-D matrix data normalized to [0.0, 1.0]
    r:  radius of the window considered
    Return
    -----------
    The filtered matrix m'.
    """
    #H: height, W: width
    H, W = m.shape
    #the output matrix m'
    mp = np.empty(m.shape)

    #cumulative sum over y axis
    ySum = yCumSum(m) #np.cumsum(m, axis=0)
    #copy the accumulated values of the windows in y
    mp[0:r+1,: ] = ySum[r:(2*r)+1,: ]
    #differences in y axis
    mp[r+1:H-r,: ] = ySum[(2*r)+1:,: ] - ySum[ :H-(2*r)-1,: ]
    mp[(-r):,: ] = np.tile(ySum[-1,: ], (r, 1)) - ySum[H-(2*r)-1:H-r-1,: ]

    #cumulative sum over x axis
    xSum = xCumSum(mp) #np.cumsum(mp, axis=1)
    #copy the accumulated values of the windows in x
    mp[:, 0:r+1] = xSum[:, r:(2*r)+1]
    #difference over x axis
    mp[:, r+1:W-r] = xSum[:, (2*r)+1: ] - xSum[:, :W-(2*r)-1]
    mp[:, -r: ] = np.tile(xSum[:, -1][:, None], (1, r)) - xSum[:, W-(2*r)-1:W-r-1]
    return mp
Run Code Online (Sandbox Code Playgroud)

边缘还有很多工作要做,但如果我可以将平铺操作作为 nopython 调用,我就可以 nopython 完成整个 boxfilter 步骤并获得巨大的性能提升。我不太愿意做一些非常具体的事情,因为我很想在其他地方重用此代码,但我不会特别反对将其限制在 2D 范围内。不管出于什么原因,我只是盯着这个,不知道从哪里开始。

mac*_*ist 8

np.tile完全重新实现有点太复杂了,但除非我误读,否则看起来你只需要采用一个向量,然后沿着不同的轴重复它r

一种与 Numba 兼容的方法是编写

y = x.repeat(r).reshape((-1, r))
Run Code Online (Sandbox Code Playgroud)

然后xr沿第二维重复多次,使得y[i, j] == x[i]

例子:

In [2]: x = np.arange(5)                                                                                                

In [3]: x.repeat(3).reshape((-1, 3))                                                                                                                                  
Out[3]: 
array([[0, 0, 0],
       [1, 1, 1],
       [2, 2, 2],
       [3, 3, 3],
       [4, 4, 4]])
Run Code Online (Sandbox Code Playgroud)

如果您想x沿第一个维度重复,只需采用转置即可y.T