Ano*_*ous 14 python multiprocessing threadpool
所以我有一个我正在编写的算法,该函数multiprocess应该调用另一个函数,CreateMatrixMp()并行调用与cpus一样多的进程.我以前从未做过多处理,也不能确定以下哪种方法更有效.在函数的上下文中使用"高效"这个词CreateMatrixMp()需要被调用数千次.我已经阅读了python multiprocessing模块的所有文档,并且已经有了这两种可能性:
首先是使用这个Pool类:
def MatrixHelper(self, args):
return self.CreateMatrix(*args)
def Multiprocess(self, sigmaI, sigmaX):
cpus = mp.cpu_count()
print('Number of cpu\'s to process WM: %d' % cpus)
poolCount = cpus*2
args = [(sigmaI, sigmaX, i) for i in range(self.numPixels)]
pool = mp.Pool(processes = poolCount, maxtasksperchild= 2)
tempData = pool.map(self.MatrixHelper, args)
pool.close()
pool.join()
Run Code Online (Sandbox Code Playgroud)
接下来是使用这个Process类:
def Multiprocess(self, sigmaI, sigmaX):
cpus = mp.cpu_count()
print('Number of cpu\'s to process WM: %d' % cpus)
processes = [mp.Process(target = self.CreateMatrixMp, args = (sigmaI, sigmaX, i,)) for i in range(self.numPixels)]
for p in processes:
p.start()
for p in processes:
p.join()
Run Code Online (Sandbox Code Playgroud)
Pool似乎是更好的选择.我读过它会减少开销.而且Process不考虑机器上的cpu数量.唯一的问题是以Pool这种方式使用会在出错后给出错误,每当我修复错误时,它下面会有一个新错误.Process似乎更容易实现,而且据我所知,它可能是更好的选择.您的经历告诉您什么?
如果Pool应该使用,那么我选择正确map()吗?最好保持订单.我有,tempData = pool.map(...)因为该map函数应该返回每个进程的结果列表.我不确定如何Process处理其返回的数据.
小智 17
我认为Pool课程通常更方便,但这取决于您是希望您的结果是有序的还是无序的.
假设您要创建4个随机字符串(例如,可能是随机用户ID生成器等):
import multiprocessing as mp
import random
import string
# Define an output queue
output = mp.Queue()
# define a example function
def rand_string(length, output):
""" Generates a random string of numbers, lower- and uppercase chars. """
rand_str = ''.join(random.choice(
string.ascii_lowercase
+ string.ascii_uppercase
+ string.digits)
for i in range(length))
output.put(rand_str)
# Setup a list of processes that we want to run
processes = [mp.Process(target=rand_string, args=(5, output)) for x in range(4)]
# Run processes
for p in processes:
p.start()
# Exit the completed processes
for p in processes:
p.join()
# Get process results from the output queue
results = [output.get() for p in processes]
print(results)
# Output
# ['yzQfA', 'PQpqM', 'SHZYV', 'PSNkD']
Run Code Online (Sandbox Code Playgroud)
在这里,订单可能并不重要.我不确定是否有更好的方法,但如果我想按照调用函数的顺序跟踪结果,我通常会返回带有ID作为第一项的元组,例如,
# define a example function
def rand_string(length, pos, output):
""" Generates a random string of numbers, lower- and uppercase chars. """
rand_str = ''.join(random.choice(
string.ascii_lowercase
+ string.ascii_uppercase
+ string.digits)
for i in range(length))
output.put((pos, rand_str))
# Setup a list of processes that we want to run
processes = [mp.Process(target=rand_string, args=(5, x, output)) for x in range(4)]
print(processes)
# Output
# [(1, '5lUya'), (3, 'QQvLr'), (0, 'KAQo6'), (2, 'nj6Q0')]
Run Code Online (Sandbox Code Playgroud)
这让我按顺序对结果进行排序:
results.sort()
results = [r[1] for r in results]
print(results)
# Output:
# ['KAQo6', '5lUya', 'nj6Q0', 'QQvLr']
Run Code Online (Sandbox Code Playgroud)
现在问你的问题:这与Pool班级有什么不同?您通常更喜欢Pool.map返回有序的结果列表,而无需通过创建元组和按ID排序.因此,我会说它通常更有效率.
def cube(x):
return x**3
pool = mp.Pool(processes=4)
results = pool.map(cube, range(1,7))
print(results)
# output:
# [1, 8, 27, 64, 125, 216]
Run Code Online (Sandbox Code Playgroud)
同样,还有一种"应用"方法:
pool = mp.Pool(processes=4)
results = [pool.apply(cube, args=(x,)) for x in range(1,7)]
print(results)
# output:
# [1, 8, 27, 64, 125, 216]
Run Code Online (Sandbox Code Playgroud)
双方Pool.apply并Pool.map直到进程完成将锁定主程序.
现在,你也有Pool.apply_async和Pool.map_async,这只要过程完成返回结果,这基本上是类似Process上面的类.优点可能是因为他们为您提供便捷apply和map功能,您从Python的内置知道apply和map