并行化词典理解

jor*_*rto 3 python dictionary multiprocessing python-2.7

我有以下功能和字典理解:

def function(name, params):
    results = fits.open(name)
    <do something more to results>
    return results

dictionary = {name: function(name, params) for name in nameList}
Run Code Online (Sandbox Code Playgroud)

并希望将其并行化.有什么简单的方法吗?

这里,我看到该multiprocessing模块可以使用,但无法理解如何使它将我的结果传递给我的字典.

注意:如果可能,请给出一个可以应用于任何返回结果的函数的答案.

注2:主要是操纵拟合文件并将结果分配给一个类

UPDATE

所以这里最终对我有用(来自@code_onkel回答):

def function(name, params):
    results = fits.open(name)
    <do something more to results>
    return results

def function_wrapper(args):
    return function(*args)

params = [...,...,..., etc]    

p = multiprocessing..Pool(processes=(max([2, mproc.cpu_count() // 10])))
args_generator = ((name, params) for name in names)

dictionary = dict(zip(names, p.map(function_wrapper, args_generator)))
Run Code Online (Sandbox Code Playgroud)

使用tqdm只能部分工作,因为我可以使用我的自定义栏,因为tqdm恢复到只有迭代的默认栏.

cod*_*kel 5

字典理解本身不能并行化.以下是如何在multiprocessingPython 2.7中使用该模块的示例.

from __future__ import print_function
import time
import multiprocessing

params = [0.5]

def function(name, params):
    print('sleeping for', name)
    time.sleep(params[0])
    return time.time()

def function_wrapper(args):
    return function(*args)

names = list('onecharNAmEs')

p = multiprocessing.Pool(3)
args_generator = ((name, params) for name in names)
dictionary = dict(zip(names, p.map(function_wrapper, args_generator)))
print(dictionary)
p.close()
Run Code Online (Sandbox Code Playgroud)

这适用于任何功能,但模块限制multiprocssing适用.最重要的是,作为参数传递的类和返回值以及要并行化的函数本身必须在模块级别定义,否则(de)序列化器将找不到它们.包装函数是必需的,因为function()它有两个参数,但Pool.map()只能处理带有一个参数的函数(作为内置map()函数).

使用Python> 3.3可以通过使用Pool作为上下文管理器和starmap()函数来简化它.

from __future__ import print_function
import time
import multiprocessing

params = [0.5]

def function(name, params):
    print('sleeping for', name)
    time.sleep(params[0])
    return time.time()

names = list('onecharnamEs')

with multiprocessing.Pool(3) as p:
    args_generator = ((name, params) for name in names)
    dictionary = dict(zip(names, p.starmap(function, args_generator)))

print(dictionary)
Run Code Online (Sandbox Code Playgroud)

这是with块的更易读的版本:

with multiprocessing.Pool(3) as p:
    args_generator = ((name, params) for name in names)
    results = p.starmap(function, args_generator)
    name_result_tuples = zip(names, results)
    dictionary = dict(name_result_tuples)
Run Code Online (Sandbox Code Playgroud)

Pool.map()函数用于具有单个参数的函数,这就是Pool.starmap()函数在3.3中添加的原因.