我正在尝试将我的数据(从hdf5格式的单个文件)写入多个文件,并且在串行执行任务时它可以正常工作.现在我想提高效率并使用multiprocessing模块修改代码,但输出有时会出错.这是我的代码的简化版本.
import multiprocessing as mp
import numpy as np
import math, h5py, time
N = 4 # number of processes to use
block_size = 300
data_sz = 678
dataFile = 'mydata.h5'
# fake some data
mydata = np.zeros((data_sz, 1))
for i in range(data_sz):
mydata[i, 0] = i+1
h5file = h5py.File(dataFile, 'w')
h5file.create_dataset('train', data=mydata)
# fire multiple workers
pool = mp.Pool(processes=N)
total_part = int(math.ceil(1. * data_sz / block_size))
for i in range(total_part):
pool.apply_async(data_write_func, args=(dataFile, i, ))
pool.close()
pool.join()
Run Code Online (Sandbox Code Playgroud)
而且 …