我编写下面的代码来测试 numba 的缓存功能
import numba
import numpy as np
import time
@numba.njit(cache=True)
def sum2d(arr):
M, N = arr.shape
result = 0.0
for i in range(M):
for j in range(N):
result += arr[i,j]
return result
a=np.random.random((1000,100))
print(time.time())
sum2d(a)
print(time.time())
print(time.time())
sum2d(a)
print(time.time())
Run Code Online (Sandbox Code Playgroud)
虽然pycache文件夹中生成了一些缓存文件,但时间总是相同的
1576855294.8787484
1576855295.5378428
1576855295.5378428
1576855295.5388253
Run Code Online (Sandbox Code Playgroud)
无论我运行这个脚本多少次,这意味着第一次运行sum2d需要更多的时间来编译。那么pycache文件夹中的缓存文件有什么用呢?
Jac*_*din 10
下面的脚本说明了这一点cache=True。它首先调用一个非缓存dummy函数来吸收初始化所需的时间numba。然后,它继续调用两次sum2d没有缓存的函数和两次sum2d有缓存的函数。
import numba
import numpy as np
import time
@numba.njit
def dummy():
return None
@numba.njit
def sum2d_nocache(arr):
M, N = arr.shape
result = 0.0
for i in range(M):
for j in range(N):
result += arr[i,j]
return result
@numba.njit(cache=True)
def sum2d_cache(arr):
M, N = arr.shape
result = 0.0
for i in range(M):
for j in range(N):
result += arr[i,j]
return result
start = time.time()
dummy()
end = time.time()
print(f'Dummy timing {end - start}')
a=np.random.random((1000,100))
start = time.time()
sum2d_nocache(a)
end = time.time()
print(f'No cache 1st timing {end - start}')
a=np.random.random((1000,100))
start = time.time()
sum2d_nocache(a)
end = time.time()
print(f'No cache 2nd timing {end - start}')
a=np.random.random((1000,100))
start = time.time()
sum2d_cache(a)
end = time.time()
print(f'Cache 1st timing {end - start}')
a=np.random.random((1000,100))
start = time.time()
sum2d_cache(a)
end = time.time()
print(f'Cache 2nd timing {end - start}')
Run Code Online (Sandbox Code Playgroud)
第一次运行后的输出:
Dummy timing 0.10361385345458984
No cache 1st timing 0.08893513679504395
No cache 2nd timing 0.00020122528076171875
Cache 1st timing 0.08929300308227539
Cache 2nd timing 0.00015544891357421875
Run Code Online (Sandbox Code Playgroud)
第二次运行后的输出:
Dummy timing 0.08973526954650879
No cache 1st timing 0.0809786319732666
No cache 2nd timing 0.0001163482666015625
Cache 1st timing 0.0016787052154541016
Cache 2nd timing 0.0001163482666015625
Run Code Online (Sandbox Code Playgroud)
这个输出告诉我们什么?
numba不可忽略。cache=True)使用的目的cache=True是避免在每次运行脚本时重复大型且复杂的函数的编译时间。在这个例子中,函数很简单,节省的时间有限,但对于具有许多更复杂函数的脚本,使用缓存可以显着减少运行时间。