我有一个for循环,它花费了很多时间。我想使用 numba 模块来加快速度。
我的环境是:
win 10
python 3.7.5
anaconda 4.8.3
numpy 0.19.2
numba 0.46.0
Run Code Online (Sandbox Code Playgroud)
原来的代码是:
def computePoints(dxFullCurve, rows, columns, direction, relativeOffset, cprSpacing):
points = []
for row in range(rows):
p = dxFullCurve[row, :]
for col in range(columns):
cprP = p.copy()
cprP = cprP + direction * (col - columns / 2 - relativeOffset[row]) * cprSpacing
points.append(cprP)
return points
if __name__ == '__main__':
dxFullCurve = np.random.random(size=[500, 3])
direction = np.array([1, 0, 0])
rows = 500
columns = 500
relativeOffset = np.random.random(size=500)
cprSpacing = 0.1
import time
t1 = time.time()
for i in range(100):
computePoints(dxFullCurve, rows, columns, direction, relativeOffset, cprSpacing)
t2 = time.time()
print('time: ', (t2-t1)/100)
Run Code Online (Sandbox Code Playgroud)
打印时间为:0.8
然后,我使用numba来加速,代码是:
@nb.jit()
def computePoints(dxFullCurve, rows, columns, direction, relativeOffset, cprSpacing):
points = []
for row in range(rows):
p = dxFullCurve[row, :]
for col in range(columns):
cprP = p.copy()
cprP = cprP + direction * (col - columns / 2 - relativeOffset[row]) * cprSpacing
points.append(cprP)
return points
Run Code Online (Sandbox Code Playgroud)
现在,时间是:0.177。numba 确实加快了速度。然而,它的速度仅提高了 4 倍。有什么方法可以让它更快吗?
然后,我尝试了 numba 并行,如下所示:
@nb.jit(nopython=True, parallel=True)
def computePoints(dxFullCurve, rows, columns, direction, relativeOffset, cprSpacing):
points = []
for row in range(rows):
p = dxFullCurve[row, :]
for col in range(columns):
cprP = p.copy()
cprP = cprP + direction * (col - columns / 2 - relativeOffset[row]) * cprSpacing
points.append(cprP)
return points
Run Code Online (Sandbox Code Playgroud)
然而,成本时间是:0.903。令人难以置信的是,它甚至比非 numba 代码花费更多的时间。
我只是想知道:有什么方法可以让我的 for 循环更快吗?
这是对 @jmd_dk 答案的较长评论。缺少一些重要的点,这进一步加快了计算速度。
\nparallel=True启用并行化。仅当运行时间足够大时这才有用。如果函数只需要几个 \xc2\xb5,请不要这样做。fastmath=True-> 允许代数变化,在数字上这可能会对结果产生影响,程序员必须决定这是否可以。error_model='numpy'-> 关闭除以零的检查,仅在真正的除法上真正需要,可以将其优化为 *0.5cache=True如果使用相同数据类型的输入调用该函数,则在重新启动解释器时只需从缓存加载该函数。如果您有更复杂的功能,这尤其有用例子
\n@nb.njit(fastmath=True,error_model="numpy",parallel=True)\ndef computePoints_nb_2(dxFullCurve, rows, columns, direction, relativeOffset, cprSpacing):\n assert dxFullCurve.shape[1]==3\n assert direction.shape[0]==3\n \n points = np.empty((rows*columns, 3))\n for row in nb.prange(rows):\n for col in range(columns):\n for i in range(3):\n points[row*columns+col, i] = dxFullCurve[row, i] + direction[i] * (col - columns / 2 - relativeOffset[row]) * cprSpacing\n return points\nRun Code Online (Sandbox Code Playgroud)\n如果可以避免内存分配。
\n@nb.njit(fastmath=True,error_model="numpy",parallel=True)\ndef computePoints_nb_2_pre(dxFullCurve, rows, columns, direction, relativeOffset, cprSpacing,points):\n assert dxFullCurve.shape[1]==3\n assert direction.shape[0]==3\n assert points.shape[1]==3\n\n for row in nb.prange(rows):\n for col in range(columns):\n for i in range(3):\n points[row*columns+col, i] = dxFullCurve[row, i] + direction[i] * (col - columns / 2 - relativeOffset[row]) * cprSpacing\n return points\nRun Code Online (Sandbox Code Playgroud)\n时间安排
\n#Implementation of jmd_dk\n%timeit computePoints_nb_1(dxFullCurve, rows, columns, direction, relativeOffset, cprSpacing)\n#23.2 ms \xc2\xb1 213 \xc2\xb5s per loop (mean \xc2\xb1 std. dev. of 7 runs, 1 loop each)\n\n%timeit computePoints_nb_2(dxFullCurve, rows, columns, direction, relativeOffset, cprSpacing)\n#1.54 ms \xc2\xb1 61.5 \xc2\xb5s per loop (mean \xc2\xb1 std. dev. of 7 runs, 1 loop each)\n\n%timeit computePoints_nb_2_pre(dxFullCurve, rows, columns, direction, relativeOffset, cprSpacing,points)\n#122 \xc2\xb5s \xc2\xb1 4.1 \xc2\xb5s per loop (mean \xc2\xb1 std. dev. of 7 runs, 10000 loops each)\nRun Code Online (Sandbox Code Playgroud)\n