我在 python 中使用 joblib 中的 Parallel 来训练 CNN。代码结构如下:
crf = CRF()
with Parallel(n_jobs=num_cores) as pal_worker:
for epoch in range(n):
temp = pal_worker(delayed(crf.runCRF)(x[i],y[i]) for i in range(m))
Run Code Online (Sandbox Code Playgroud)
代码可以成功运行 1 或 2 个 epoch,然后发生错误说(我列出了我认为重要的要点):
......
File "/data_shared/Docker/tsun/software/anaconda3/envs/pytorch04/lib/python3.5/site-packages/joblib/numpy_pickle.py", line 104, in write_array
pickler.file_handle.write(chunk.tostring('C'))
OSError: [Errno 28] No space left on device
"""
The above exception was the direct cause of the following exception:
return future.result(timeout=timeout)
File
......
_pickle.PicklingError: Could not pickle the task to send it to the workers.
Run Code Online (Sandbox Code Playgroud)
我很困惑,因为磁盘有很多空间,程序可以成功运行 1 或 2 …