Numba - 如何并行填充 2D 数组

iR0*_*Nic 4 python vectorization numba

我有一个对 float64(x,y) 上的二维矩阵进行操作的函数。基本概念:对于每个行组合(行数选择2),计算减法后正值的数量(行1 - 行2)。在 int64(y,y) 的 2Dmatrix 中,如果值高于某个阈值,则将该值存储在索引 [row1,row2] 中;如果低于某个阈值,则将该值存储在索引 [row2,row1] 中。

\n\n

我已经实现了它并用 @njit(parallel=False) 装饰它,效果很好 @njit(parallel=True) 似乎没有加速。为了加快整个过程,我查看了@guvectorize,这也很有效。但是,在这种情况下,我也无法弄清楚如何将 @guvectorize 与并行 true 一起使用。

\n\n

我查看了numba guvectorize target='parallel' 比 target='cpu' 慢,其中解决方案是使用 @vecorize 代替,但我无法将解决方案转移到我的问题,因此我现在正在寻求帮助:)

\n\n

基本的抖动和向量化实现

\n\n
import numpy as np\nfrom numba import jit, guvectorize, prange\nimport timeit\n\n@jit(parallel=False)\ndef check_pairs_sg(raw_data):\n    # 2D array to be filled\n    result = np.full((len(raw_data), len(raw_data)), -1)\n\n    # Iterate over all possible gene combinations\n    for r1 in range(0, len(raw_data)):\n        for r2 in range(r1+1, len(raw_data)):\n            diff = np.subtract(raw_data[:, r1], raw_data[:, r2])\n\n            num_pos = len(np.where(diff > 0)[0])\n\n            # Arbitrary check to illustrate\n            if num_pos >= 5: \n               result[r1,r2] = num_pos\n            else:\n               result[r2,r1] = num_pos\n\n    return result\n\n@jit(parallel=True)\ndef check_pairs_multi(raw_data):\n    # 2D array to be filled\n    result = np.full((len(raw_data), len(raw_data)), -1)\n\n    # Iterate over all possible gene combinations\n    for r1 in range(0, len(raw_data)):\n        for r2 in prange(r1+1, len(raw_data)):\n            diff = np.subtract(raw_data[:, r1], raw_data[:, r2])\n\n            num_pos = len(np.where(diff > 0)[0])\n\n            # Arbitrary check to illustrate\n            if num_pos >= 5: \n               result[r1,r2] = num_pos\n            else:\n               result[r2,r1] = num_pos\n\n    return result\n\n@guvectorize(["void(float64[:,:], int64[:,:])"],\n             "(n,m)->(m,m)", target=\'cpu\')\ndef check_pairs_guvec_sg(raw_data, result):\n    for r1 in range(0, len(result)):\n        for r2 in range(r1+1, len(result)):\n            diff = np.subtract(raw_data[:, r1], raw_data[:, r2])\n\n            num_pos = len(np.where(diff > 0)[0])\n\n            # Arbitrary check to illustrate\n            if num_pos >= 5: \n               result[r1,r2] = num_pos\n            else:\n               result[r2,r1] = num_pos\n\n@guvectorize(["void(float64[:,:], int64[:,:])"],\n             "(n,m)->(m,m)", target=\'parallel\')\ndef check_pairs_guvec_multi(raw_data, result):\n    for r1 in range(0, len(result)):\n        for r2 in range(r1+1, len(result)):\n            diff = np.subtract(raw_data[:, r1], raw_data[:, r2])\n\n            num_pos = len(np.where(diff > 0)[0])\n\n            # Arbitrary check to illustrate\n            if num_pos >= 5: \n               result[r1,r2] = num_pos\n            else:\n               result[r2,r1] = num_pos\n\nif __name__=="__main__":\n     np.random.seed(404)\n     a = np.random.random((512,512)).astype(np.float64)\n     res = np.full((len(a), len(a)), -1)\n\n
Run Code Online (Sandbox Code Playgroud)\n\n

并用测量

\n\n
%timeit check_pairs_sg(a)\n%timeit check_pairs_multi(a)\n%timeit check_pairs_guvec_sg(a, res)\n%timeit check_pairs_guvec_multi(a, res)\n
Run Code Online (Sandbox Code Playgroud)\n\n

导致:

\n\n
614 ms \xc2\xb1 2.54 ms per loop (mean \xc2\xb1 std. dev. of 7 runs, 1 loop each)\n507 ms \xc2\xb1 6.87 ms per loop (mean \xc2\xb1 std. dev. of 7 runs, 1 loop each)\n622 ms \xc2\xb1 3.88 ms per loop (mean \xc2\xb1 std. dev. of 7 runs, 1 loop each)\n671 ms \xc2\xb1 4.35 ms per loop (mean \xc2\xb1 std. dev. of 7 runs, 1 loop each)\n
Run Code Online (Sandbox Code Playgroud)\n\n

我一直在思考如何将其实现为 @vectorized 或适当的并行 @guvectorize,以真正并行地填充生成的 2D 数组。

\n\n

我想这是我尝试将其进一步应用到 GPU 之前的第一步。

\n\n

非常感谢任何帮助。

\n

max*_*111 5

编写 Numba 代码时考虑其他编译语言

\n

例如,考虑一下以下几行的或多或少完全相同的实现

\n
diff = np.subtract(raw_data[:, r1], raw_data[:, r2])\nnum_pos = len(np.where(diff > 0)[0])\n
Run Code Online (Sandbox Code Playgroud)\n

在C++中。

\n

伪代码

\n
    \n
  • 分配一个数组 diff,循环 raw_data[i*size_dim_1+r1](循环索引为 i)
  • \n
  • 分配一个布尔数组,循环遍历整个数组 diff 并检查 diff[i]>0
  • \n
  • 循环布尔数组,获取 b_arr==True 的索引,并通过 vector::push_back() 将它们保存到向量中。
  • \n
  • 检查向量的大小
  • \n
\n

您的代码中的主要问题是:

\n
    \n
  • 为简单操作创建临时数组
  • \n
  • 非连续内存访问
  • \n
\n

优化代码

\n

删除临时数组并简化

\n
@nb.njit(parallel=False)\ndef check_pairs_simp(raw_data):\n    # 2D array to be filled\n    result = np.full((raw_data.shape[0],raw_data.shape[1]), -1)\n    \n    # Iterate over all possible gene combinations\n    for r1 in range(0, raw_data.shape[1]):\n        for r2 in range(r1+1, raw_data.shape[1]):\n            num_pos=0\n            for i in range(raw_data.shape[0]):\n                if (raw_data[i,r1]>raw_data[i,r2]):\n                    num_pos+=1\n            \n            # Arbitrary check to illustrate\n            if num_pos >= 5: \n               result[r1,r2] = num_pos\n            else:\n               result[r2,r1] = num_pos\n    \n    return result\n
Run Code Online (Sandbox Code Playgroud)\n

删除临时数组并简化+连续内存访问

\n
@nb.njit(parallel=False)\ndef check_pairs_simp_rev(raw_data_in):\n    #Create a transposed array not just a view \n    raw_data=np.ascontiguousarray(raw_data_in.T)\n    \n    # 2D array to be filled\n    result = np.full((raw_data.shape[0],raw_data.shape[1]), -1)\n    \n    # Iterate over all possible gene combinations\n    for r1 in range(0, raw_data.shape[0]):\n        for r2 in range(r1+1, raw_data.shape[0]):\n            num_pos=0\n            for i in range(raw_data.shape[1]):\n                if (raw_data[r1,i]>raw_data[r2,i]):\n                    num_pos+=1\n            \n            # Arbitrary check to illustrate\n            if num_pos >= 5: \n               result[r1,r2] = num_pos\n            else:\n               result[r2,r1] = num_pos\n    \n    return result\n
Run Code Online (Sandbox Code Playgroud)\n

删除临时数组并简化+连续内存访问+并行化

\n
@nb.njit(parallel=True,fastmath=True)\ndef check_pairs_simp_rev_p(raw_data_in):\n    #Create a transposed array not just a view \n    raw_data=np.ascontiguousarray(raw_data_in.T)\n    \n    # 2D array to be filled\n    result = np.full((raw_data.shape[0],raw_data.shape[1]), -1)\n    \n    # Iterate over all possible gene combinations\n    for r1 in nb.prange(0, raw_data.shape[0]):\n        for r2 in range(r1+1, raw_data.shape[0]):\n            num_pos=0\n            for i in range(raw_data.shape[1]):\n                if (raw_data[r1,i]>raw_data[r2,i]):\n                    num_pos+=1\n            \n            # Arbitrary check to illustrate\n            if num_pos >= 5: \n               result[r1,r2] = num_pos\n            else:\n               result[r2,r1] = num_pos\n    \n    return result\n
Run Code Online (Sandbox Code Playgroud)\n

时间安排

\n
%timeit check_pairs_sg(a)\n488 ms \xc2\xb1 8.68 ms per loop (mean \xc2\xb1 std. dev. of 7 runs, 1 loop each)\n%timeit check_pairs_simp(a)\n186 ms \xc2\xb1 3.83 ms per loop (mean \xc2\xb1 std. dev. of 7 runs, 10 loops each)\n%timeit check_pairs_simp_rev(a)\n12.1 ms \xc2\xb1 226 \xc2\xb5s per loop (mean \xc2\xb1 std. dev. of 7 runs, 100 loops each)\n%timeit check_pairs_simp_rev_p(a)\n5.43 ms \xc2\xb1 49.1 \xc2\xb5s per loop (mean \xc2\xb1 std. dev. of 7 runs, 100 loops each)\n
Run Code Online (Sandbox Code Playgroud)\n