如何使用Python多处理Pool.map在for循环中填充numpy数组

MoT*_*GGE 9 python arrays numpy pool multiprocessing

我想在for循环中填充2D-numpy数组,并使用多处理来固定计算.

import numpy
from multiprocessing import Pool


array_2D = numpy.zeros((20,10))
pool = Pool(processes = 4)

def fill_array(start_val):
    return range(start_val,start_val+10)

list_start_vals = range(40,60)
for line in xrange(20):
    array_2D[line,:] = pool.map(fill_array,list_start_vals)
pool.close()

print array_2D
Run Code Online (Sandbox Code Playgroud)

执行它的效果是Python运行4个子进程并占用4个CPU核心,但执行没有完成,并且不打印数组.如果我尝试将数组写入磁盘,则没有任何反应.

谁能告诉我为什么?

Sau*_*tro 5

以下作品。首先,最好将代码的主要部分保护在主块内,以避免奇怪的副作用。的结果pool.map()是一个列表,其中包含迭代器中每个值的评估list_start_vals,这样您就不必array_2D之前创建。

import numpy as np
from multiprocessing import Pool

def fill_array(start_val):
    return list(range(start_val, start_val+10))

if __name__=='__main__':
    pool = Pool(processes=4)
    list_start_vals = range(40, 60)
    array_2D = np.array(pool.map(fill_array, list_start_vals))
    pool.close() # ATTENTION HERE
    print array_2D
Run Code Online (Sandbox Code Playgroud)

也许你在使用时会遇到麻烦pool.close(),从@hpaulj的评论中,如果你遇到问题,你可以删除这一行......


Ram*_*Ram 0

问题是由于运行pool.mapin for 循环造成的,map() 方法的结果在功能上与内置的 map() 等效,只是各个任务是并行运行的。因此,在您的情况下, pool.map(fill_array,list_start_vals) 将被调用 20 次,并开始为 for 循环的每次迭代并行运行,下面的代码应该可以工作

代码:

#!/usr/bin/python

import numpy
from multiprocessing import Pool

def fill_array(start_val):
    return range(start_val,start_val+10)

if __name__ == "__main__":
    array_2D = numpy.zeros((20,10))
    pool = Pool(processes = 4)    
    list_start_vals = range(40,60)

    # running the pool.map in a for loop is wrong
    #for line in xrange(20):
    #    array_2D[line,:] = pool.map(fill_array,list_start_vals)

    # get the result of pool.map (list of values returned by fill_array)
    # in a pool_result list 
    pool_result = pool.map(fill_array,list_start_vals)

    # the pool is processing its inputs in parallel, close() and join() 
    #can be used to synchronize the main process 
    #with the task processes to ensure proper cleanup.
    pool.close()
    pool.join()

    # Now assign the pool_result to your numpy
    for line,result in enumerate(pool_result):
        array_2D[line,:] = result

    print array_2D
Run Code Online (Sandbox Code Playgroud)