iR0*_*Nic 4 python vectorization numba
我有一个对 float64(x,y) 上的二维矩阵进行操作的函数。基本概念:对于每个行组合(行数选择2),计算减法后正值的数量(行1 - 行2)。在 int64(y,y) 的 2Dmatrix 中,如果值高于某个阈值,则将该值存储在索引 [row1,row2] 中;如果低于某个阈值,则将该值存储在索引 [row2,row1] 中。
\n\n我已经实现了它并用 @njit(parallel=False) 装饰它,效果很好 @njit(parallel=True) 似乎没有加速。为了加快整个过程,我查看了@guvectorize,这也很有效。但是,在这种情况下,我也无法弄清楚如何将 @guvectorize 与并行 true 一起使用。
\n\n我查看了numba guvectorize target='parallel' 比 target='cpu' 慢,其中解决方案是使用 @vecorize 代替,但我无法将解决方案转移到我的问题,因此我现在正在寻求帮助:)
\n\n基本的抖动和向量化实现
\n\nimport numpy as np\nfrom numba import jit, guvectorize, prange\nimport timeit\n\n@jit(parallel=False)\ndef check_pairs_sg(raw_data):\n # 2D array to be filled\n result = np.full((len(raw_data), len(raw_data)), -1)\n\n # Iterate over all possible gene combinations\n for r1 in range(0, len(raw_data)):\n for r2 in range(r1+1, len(raw_data)):\n diff = np.subtract(raw_data[:, r1], raw_data[:, r2])\n\n num_pos = len(np.where(diff > 0)[0])\n\n # Arbitrary check to illustrate\n if num_pos >= 5: \n result[r1,r2] = num_pos\n else:\n result[r2,r1] = num_pos\n\n return result\n\n@jit(parallel=True)\ndef check_pairs_multi(raw_data):\n # 2D array to be filled\n result = np.full((len(raw_data), len(raw_data)), -1)\n\n # Iterate over all possible gene combinations\n for r1 in range(0, len(raw_data)):\n for r2 in prange(r1+1, len(raw_data)):\n diff = np.subtract(raw_data[:, r1], raw_data[:, r2])\n\n num_pos = len(np.where(diff > 0)[0])\n\n # Arbitrary check to illustrate\n if num_pos >= 5: \n result[r1,r2] = num_pos\n else:\n result[r2,r1] = num_pos\n\n return result\n\n@guvectorize(["void(float64[:,:], int64[:,:])"],\n "(n,m)->(m,m)", target=\'cpu\')\ndef check_pairs_guvec_sg(raw_data, result):\n for r1 in range(0, len(result)):\n for r2 in range(r1+1, len(result)):\n diff = np.subtract(raw_data[:, r1], raw_data[:, r2])\n\n num_pos = len(np.where(diff > 0)[0])\n\n # Arbitrary check to illustrate\n if num_pos >= 5: \n result[r1,r2] = num_pos\n else:\n result[r2,r1] = num_pos\n\n@guvectorize(["void(float64[:,:], int64[:,:])"],\n "(n,m)->(m,m)", target=\'parallel\')\ndef check_pairs_guvec_multi(raw_data, result):\n for r1 in range(0, len(result)):\n for r2 in range(r1+1, len(result)):\n diff = np.subtract(raw_data[:, r1], raw_data[:, r2])\n\n num_pos = len(np.where(diff > 0)[0])\n\n # Arbitrary check to illustrate\n if num_pos >= 5: \n result[r1,r2] = num_pos\n else:\n result[r2,r1] = num_pos\n\nif __name__=="__main__":\n np.random.seed(404)\n a = np.random.random((512,512)).astype(np.float64)\n res = np.full((len(a), len(a)), -1)\n\nRun Code Online (Sandbox Code Playgroud)\n\n并用测量
\n\n%timeit check_pairs_sg(a)\n%timeit check_pairs_multi(a)\n%timeit check_pairs_guvec_sg(a, res)\n%timeit check_pairs_guvec_multi(a, res)\nRun Code Online (Sandbox Code Playgroud)\n\n导致:
\n\n614 ms \xc2\xb1 2.54 ms per loop (mean \xc2\xb1 std. dev. of 7 runs, 1 loop each)\n507 ms \xc2\xb1 6.87 ms per loop (mean \xc2\xb1 std. dev. of 7 runs, 1 loop each)\n622 ms \xc2\xb1 3.88 ms per loop (mean \xc2\xb1 std. dev. of 7 runs, 1 loop each)\n671 ms \xc2\xb1 4.35 ms per loop (mean \xc2\xb1 std. dev. of 7 runs, 1 loop each)\nRun Code Online (Sandbox Code Playgroud)\n\n我一直在思考如何将其实现为 @vectorized 或适当的并行 @guvectorize,以真正并行地填充生成的 2D 数组。
\n\n我想这是我尝试将其进一步应用到 GPU 之前的第一步。
\n\n非常感谢任何帮助。
\n例如,考虑一下以下几行的或多或少完全相同的实现
\ndiff = np.subtract(raw_data[:, r1], raw_data[:, r2])\nnum_pos = len(np.where(diff > 0)[0])\nRun Code Online (Sandbox Code Playgroud)\n在C++中。
\n伪代码
\n您的代码中的主要问题是:
\n删除临时数组并简化
\n@nb.njit(parallel=False)\ndef check_pairs_simp(raw_data):\n # 2D array to be filled\n result = np.full((raw_data.shape[0],raw_data.shape[1]), -1)\n \n # Iterate over all possible gene combinations\n for r1 in range(0, raw_data.shape[1]):\n for r2 in range(r1+1, raw_data.shape[1]):\n num_pos=0\n for i in range(raw_data.shape[0]):\n if (raw_data[i,r1]>raw_data[i,r2]):\n num_pos+=1\n \n # Arbitrary check to illustrate\n if num_pos >= 5: \n result[r1,r2] = num_pos\n else:\n result[r2,r1] = num_pos\n \n return result\nRun Code Online (Sandbox Code Playgroud)\n删除临时数组并简化+连续内存访问
\n@nb.njit(parallel=False)\ndef check_pairs_simp_rev(raw_data_in):\n #Create a transposed array not just a view \n raw_data=np.ascontiguousarray(raw_data_in.T)\n \n # 2D array to be filled\n result = np.full((raw_data.shape[0],raw_data.shape[1]), -1)\n \n # Iterate over all possible gene combinations\n for r1 in range(0, raw_data.shape[0]):\n for r2 in range(r1+1, raw_data.shape[0]):\n num_pos=0\n for i in range(raw_data.shape[1]):\n if (raw_data[r1,i]>raw_data[r2,i]):\n num_pos+=1\n \n # Arbitrary check to illustrate\n if num_pos >= 5: \n result[r1,r2] = num_pos\n else:\n result[r2,r1] = num_pos\n \n return result\nRun Code Online (Sandbox Code Playgroud)\n删除临时数组并简化+连续内存访问+并行化
\n@nb.njit(parallel=True,fastmath=True)\ndef check_pairs_simp_rev_p(raw_data_in):\n #Create a transposed array not just a view \n raw_data=np.ascontiguousarray(raw_data_in.T)\n \n # 2D array to be filled\n result = np.full((raw_data.shape[0],raw_data.shape[1]), -1)\n \n # Iterate over all possible gene combinations\n for r1 in nb.prange(0, raw_data.shape[0]):\n for r2 in range(r1+1, raw_data.shape[0]):\n num_pos=0\n for i in range(raw_data.shape[1]):\n if (raw_data[r1,i]>raw_data[r2,i]):\n num_pos+=1\n \n # Arbitrary check to illustrate\n if num_pos >= 5: \n result[r1,r2] = num_pos\n else:\n result[r2,r1] = num_pos\n \n return result\nRun Code Online (Sandbox Code Playgroud)\n时间安排
\n%timeit check_pairs_sg(a)\n488 ms \xc2\xb1 8.68 ms per loop (mean \xc2\xb1 std. dev. of 7 runs, 1 loop each)\n%timeit check_pairs_simp(a)\n186 ms \xc2\xb1 3.83 ms per loop (mean \xc2\xb1 std. dev. of 7 runs, 10 loops each)\n%timeit check_pairs_simp_rev(a)\n12.1 ms \xc2\xb1 226 \xc2\xb5s per loop (mean \xc2\xb1 std. dev. of 7 runs, 100 loops each)\n%timeit check_pairs_simp_rev_p(a)\n5.43 ms \xc2\xb1 49.1 \xc2\xb5s per loop (mean \xc2\xb1 std. dev. of 7 runs, 100 loops each)\nRun Code Online (Sandbox Code Playgroud)\n