我尝试了一个简单的代码来使用 numba 和 prange 并行化循环。但由于某种原因,当我使用更多线程而不是更快时,它会变得更慢。为什么会发生这种情况?(CPU 锐龙 7 2700x 8 核 16 线程 3.7GHz)
\nfrom numba import njit, prange,set_num_threads,get_num_threads\n@njit(parallel=True,fastmath=True)\ndef test1():\n x=np.empty((10,10))\n for i in prange(10):\n for j in range(10):\n x[i,j]=i+j\nRun Code Online (Sandbox Code Playgroud)\nNumber of threads : 1\n897 ns \xc2\xb1 18.3 ns per loop (mean \xc2\xb1 std. dev. of 10 runs, 100000 loops each)\nNumber of threads : 2\n1.68 \xc2\xb5s \xc2\xb1 262 ns per loop (mean \xc2\xb1 std. dev. of 10 runs, 100000 loops each)\nNumber of threads : 3\n2.4 \xc2\xb5s \xc2\xb1 163 ns …Run Code Online (Sandbox Code Playgroud) 当我使用 numba 中的 njit 并行运行该程序时,我注意到使用多个线程并没有什么区别。事实上,从 1-5 个线程开始,时间会更快(这是预期的),但之后时间会变慢。为什么会发生这种情况?
\nfrom numba import njit,prange,set_num_threads,get_num_threads\nimport numpy as np\n@njit(parallel=True)\ndef test(x,y):\n z=np.empty((x.shape[0],x.shape[0]),dtype=np.float64)\n for i in prange(x.shape[0]):\n for j in range(x.shape[0]):\n z[i,j]=x[i,j]*y[i,j]\n return z\nRun Code Online (Sandbox Code Playgroud)\nx=np.random.rand(10000,10000)\ny=np.random.rand(10000,10000)\nfor i in range(16): \n set_num_threads(i+1)\n print("Number of threads :",get_num_threads())\n %timeit -r 1 -n 10 test(x,y)\nRun Code Online (Sandbox Code Playgroud)\nNumber of threads : 1\n234 ms \xc2\xb1 0 ns per loop (mean \xc2\xb1 std. dev. of 1 run, 10 loops each)\nNumber of threads : 2\n178 ms \xc2\xb1 0 ns per loop (mean \xc2\xb1 std. …Run Code Online (Sandbox Code Playgroud)