我有一个 25GB 的 numpy 数组字典。该词典如下所示:
"109c3708-3b0c-4868-a647-b9feb306c886_1"200x23、 type的 numpy 数组float64当我在循环中重复使用 pickle 加载数据时,加载时间会变慢(请参阅下面的代码和结果)。可能是什么原因造成的?
代码:
def load_pickle(file: int) -> dict:
with open(f"D:/data/batched/{file}.pickle", "rb") as handle:
return pickle.load(handle)
for i in range(0, 9):
print(f"\nIteration {i}")
start_time = time.time()
file = None
print(f"Unloaded file in {time.time() - start_time:.2f} seconds")
start_time = time.time()
file = load_pickle(0)
print(f"Loaded file in {time.time() - start_time:.2f} seconds")
Run Code Online (Sandbox Code Playgroud)
结果:
Iteration 0
Unloaded file in 0.00 seconds
Loaded file in 18.80 seconds
Iteration 1
Unloaded file in 14.78 seconds
Loaded file in 30.51 seconds
Iteration 2
Unloaded file in 28.67 seconds
Loaded file in 30.21 seconds
Iteration 3
Unloaded file in 35.38 seconds
Loaded file in 40.25 seconds
Iteration 4
Unloaded file in 39.91 seconds
Loaded file in 41.24 seconds
Iteration 5
Unloaded file in 43.25 seconds
Loaded file in 45.57 seconds
Iteration 6
Unloaded file in 46.94 seconds
Loaded file in 48.19 seconds
Iteration 7
Unloaded file in 51.67 seconds
Loaded file in 51.32 seconds
Iteration 8
Unloaded file in 55.25 seconds
Loaded file in 56.11 seconds
Run Code Online (Sandbox Code Playgroud)
笔记:
file变量中的先前数据),然后再次上升。随着时间的推移,卸载和装载零件的速度似乎都会减慢。令我惊讶的是,卸载部分的 RAM 下降得如此之慢。del fileand ,但这并没有加快任何速度。gc.collect()return pickle.load(handle)为return handle.read(),则卸载时间始终为 0.45 秒,加载时间始终为 4.85 秒。Python 3.9.13 | packaged by conda-forge | (main, May 27 2022, 16:51:29) [MSC v.1929 64 bit (AMD64)])。有任何想法吗?如果有一种具有相似读取速度并且不会遇到上述问题的替代方案(我不担心压缩),我也愿意放弃使用 pickle。
编辑:我已经针对不同大小的泡菜运行了上述加载和卸载循环。下面的结果显示了速度随时间的相对变化。对于 3 GB 以上的任何内容,卸载时间开始显着增加。