Python多处理:是否可以在池中设置池?

rho*_*ron 16 python multiprocessing

我有一个模块A,它通过获取数据并将其发送到模块B,C,D等进行分析然后将它们的结果连接在一起来执行基本映射/缩减.

但似乎模块B,C,D等本身不能创建多处理池,否则我得到

AssertionError: daemonic processes are not allowed to have children
Run Code Online (Sandbox Code Playgroud)

是否有可能以其他方式并行化这些工作?

为清楚起见,这里是一个(通常是坏的)婴儿的例子.(我通常会尝试/捕捉,但你得到了要点.)

A.py:

  import B
  from multiprocessing import Pool

  def main():
    p = Pool()
    results = p.map(B.foo,range(10))
    p.close()
    p.join()
    return results


B.py:

  from multiprocessing import Pool

  def foo(x):
    p = Pool()
    results = p.map(str,x)
    p.close()
    p.join()
    return results
Run Code Online (Sandbox Code Playgroud)

jfs*_*jfs 22

可以在游泳池内设一个游泳池吗?

是的,虽然这可能不是一个好主意,除非你想要组建一支僵尸军队.从Python Process Pool非守护进程?:

import multiprocessing.pool
from contextlib import closing
from functools import partial

class NoDaemonProcess(multiprocessing.Process):
    # make 'daemon' attribute always return False
    def _get_daemon(self):
        return False
    def _set_daemon(self, value):
        pass
    daemon = property(_get_daemon, _set_daemon)

# We sub-class multiprocessing.pool.Pool instead of multiprocessing.Pool
# because the latter is only a wrapper function, not a proper class.
class Pool(multiprocessing.pool.Pool):
    Process = NoDaemonProcess

def foo(x, depth=0):
    if depth == 0:
        return x
    else:
        with closing(Pool()) as p:
            return p.map(partial(foo, depth=depth-1), range(x + 1))

if __name__ == "__main__":
    from pprint import pprint
    pprint(foo(10, depth=2))
Run Code Online (Sandbox Code Playgroud)

产量

[[0],
 [0, 1],
 [0, 1, 2],
 [0, 1, 2, 3],
 [0, 1, 2, 3, 4],
 [0, 1, 2, 3, 4, 5],
 [0, 1, 2, 3, 4, 5, 6],
 [0, 1, 2, 3, 4, 5, 6, 7],
 [0, 1, 2, 3, 4, 5, 6, 7, 8],
 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]]
Run Code Online (Sandbox Code Playgroud)

concurrent.futures 默认支持它:

# $ pip install futures # on Python 2
from concurrent.futures import ProcessPoolExecutor as Pool
from functools import partial

def foo(x, depth=0):
    if depth == 0:
        return x
    else:
        with Pool() as p:
            return list(p.map(partial(foo, depth=depth-1), range(x + 1)))

if __name__ == "__main__":
    from pprint import pprint
    pprint(foo(10, depth=2))
Run Code Online (Sandbox Code Playgroud)

它产生相同的输出.

是否有可能以其他方式并行化这些工作?

是.例如,查看如何celery创建复杂的工作流程.