每行的Bin元素 - NumPy的矢量化2D Bincount

Gri*_*ory 7 python performance numpy matrix vectorization

我有一个带整数值的NumPy数组.矩阵的值在矩阵中从0到最大元素(换句话说,在其中呈现的从0到最大数据元素的所有数字).我需要构建有效(有效的快速全矢量化解决方案)来搜索每行中的元素数量,并根据矩阵值对其进行编码.

我找不到类似的问题,或者以某种方式帮助解决这个问题的问题.

所以,如果我有这个data输入:

# shape is (N0=4, m0=4) 
1   1   0   4
2   4   2   1
1   2   3   5
4   4   4   1
Run Code Online (Sandbox Code Playgroud)

期望的输出是:

# shape(N=N0, m=data.max()+1):
1   2   0   0   1   0
0   1   2   0   1   0
0   1   1   1   0   1
0   1   0   0   3   0
Run Code Online (Sandbox Code Playgroud)

我知道如何通过简单地计算逐行data 迭代的每一行中的唯一值来解决这个问题,然后结合考虑到data数组中所有可能值的结果.

在使用NumPy进行矢量化时,关键问题是逐个搜索每个数字很慢并且假设有很多唯一数字,这不是有效的解决方案.通常,两者N和唯一数字计数相当大(顺便说一下,N似乎比唯一数字计数大).

有人有好主意吗?)

Div*_*kar 12

那基本上np.bincount1D数组有什么关系呢.但是,我们需要迭代地在每一行上使用它(简单地考虑它).为了使其矢量化,我们可以将每行偏移该最大数量.我们的想法是为每一行设置不同的箱子,使它们不受具有相同数字的其他行元素的影响.

因此,实施将是 -

# Vectorized solution
def bincount2D_vectorized(a):    
    N = a.max()+1
    a_offs = a + np.arange(a.shape[0])[:,None]*N
    return np.bincount(a_offs.ravel(), minlength=a.shape[0]*N).reshape(-1,N)
Run Code Online (Sandbox Code Playgroud)

样品运行 -

In [189]: a
Out[189]: 
array([[1, 1, 0, 4],
       [2, 4, 2, 1],
       [1, 2, 3, 5],
       [4, 4, 4, 1]])

In [190]: bincount2D_vectorized(a)
Out[190]: 
array([[1, 2, 0, 0, 1, 0],
       [0, 1, 2, 0, 1, 0],
       [0, 1, 1, 1, 0, 1],
       [0, 1, 0, 0, 3, 0]])
Run Code Online (Sandbox Code Playgroud)

Numba Tweaks

我们可以带来numba进一步的加速.现在,numba允许一些调整.

  • 首先,它允许JIT编译.

  • 此外,最近他们引入了实验parallel,自动并行化已知具有并行语义的函数中的操作.

  • 最后的调整是prange用作替补range.文档声明它并行运行循环,类似于OpenMP parallel for循环和Cython的prange.prange适用于较大的数据集,这可能是因为设置并行工作所需的开销.

因此,通过这些新的两个调整以及njitfor no-Python模式,我们将有三个变体 -

# Numba solutions
def bincount2D_numba(a, use_parallel=False, use_prange=False):
    N = a.max()+1
    m,n = a.shape
    out = np.zeros((m,N),dtype=int)

    # Choose fucntion based on args
    func = bincount2D_numba_func0
    if use_parallel:
        if use_prange:
            func = bincount2D_numba_func2
        else:
            func = bincount2D_numba_func1
    # Run chosen function on input data and output
    func(a, out, m, n)
    return out

@njit
def bincount2D_numba_func0(a, out, m, n):
    for i in range(m):
        for j in range(n):
            out[i,a[i,j]] += 1

@njit(parallel=True)
def bincount2D_numba_func1(a, out, m, n):
    for i in range(m):
        for j in range(n):
            out[i,a[i,j]] += 1

@njit(parallel=True)
def bincount2D_numba_func2(a, out, m, n):
    for i in prange(m):
        for j in prange(n):
            out[i,a[i,j]] += 1
Run Code Online (Sandbox Code Playgroud)

为了完整性并稍后测试,loopy版本将是 -

# Loopy solution
def bincount2D_loopy(a):
    N = a.max()+1
    m,n = a.shape
    out = np.zeros((m,N),dtype=int)
    for i in range(m):
        out[i] = np.bincount(a[i], minlength=N)
    return out 
Run Code Online (Sandbox Code Playgroud)

运行时测试

情况1 :

In [312]: a = np.random.randint(0,100,(100,100))

In [313]: %timeit bincount2D_loopy(a)
     ...: %timeit bincount2D_vectorized(a)
     ...: %timeit bincount2D_numba(a, use_parallel=False, use_prange=False)
     ...: %timeit bincount2D_numba(a, use_parallel=True, use_prange=False)
     ...: %timeit bincount2D_numba(a, use_parallel=True, use_prange=True)
10000 loops, best of 3: 115 µs per loop
10000 loops, best of 3: 36.7 µs per loop
10000 loops, best of 3: 22.6 µs per loop
10000 loops, best of 3: 22.7 µs per loop
10000 loops, best of 3: 39.9 µs per loop
Run Code Online (Sandbox Code Playgroud)

案例#2:

In [316]: a = np.random.randint(0,100,(1000,1000))

In [317]: %timeit bincount2D_loopy(a)
     ...: %timeit bincount2D_vectorized(a)
     ...: %timeit bincount2D_numba(a, use_parallel=False, use_prange=False)
     ...: %timeit bincount2D_numba(a, use_parallel=True, use_prange=False)
     ...: %timeit bincount2D_numba(a, use_parallel=True, use_prange=True)
100 loops, best of 3: 2.97 ms per loop
100 loops, best of 3: 3.54 ms per loop
1000 loops, best of 3: 1.83 ms per loop
100 loops, best of 3: 1.78 ms per loop
1000 loops, best of 3: 1.4 ms per loop
Run Code Online (Sandbox Code Playgroud)

案例#3:

In [318]: a = np.random.randint(0,1000,(1000,1000))

In [319]: %timeit bincount2D_loopy(a)
     ...: %timeit bincount2D_vectorized(a)
     ...: %timeit bincount2D_numba(a, use_parallel=False, use_prange=False)
     ...: %timeit bincount2D_numba(a, use_parallel=True, use_prange=False)
     ...: %timeit bincount2D_numba(a, use_parallel=True, use_prange=True)
100 loops, best of 3: 4.01 ms per loop
100 loops, best of 3: 4.86 ms per loop
100 loops, best of 3: 3.21 ms per loop
100 loops, best of 3: 3.18 ms per loop
100 loops, best of 3: 2.45 ms per loop
Run Code Online (Sandbox Code Playgroud)

似乎numba变体表现非常好.从三个变量中选择一个将取决于输入数组形状参数,并在某种程度上取决于其中的唯一元素数量.