多处理 - 执行外部命令并等待继续

use*_*280 3 python external-process multiprocessing

我正在使用Linux.我有一个名为"combine"的外部可执行文件和一个20次迭代的循环.每次迭代时,需要使用依赖于第i次迭代的参数调用"combine".例:

arguments = " "

for i in range(1,20):
    arguments += str(i) + "_image.jpg "
    # begin of pseudo-code 
    execute: "./combine" + arguments  # in parallel using all cores

# pseudo-code continues
wait_for_all_previous_process_to_terminate
execute: "./merge_resized_images"  # use all cores - possible for one single command?
Run Code Online (Sandbox Code Playgroud)

如何使用Python中的多处理模块实现此目的?

dan*_*ano 11

您可以使用subprocess.Popen异步启动外部命令,并存储Popen列表中返回的每个对象.一旦你启动了所有进程,只需迭代它们并等待每个进程完成使用popen_object.wait.

from subprocess import Popen

processes = []
for i in range(1,20):
    arguments += str(i) + "_image.jpg "
    processes.append(subprocess.Popen(shlex.split("./combine" + arguments)))

for p in processes:
    p.wait()
subprocess.call("./merge_resized_images")
Run Code Online (Sandbox Code Playgroud)

但是,这将启动20个并发进程,这可能会损害性能.

为避免这种情况,您可以使用a ThreadPool将自己限制为较少数量的并发进程(multiprocessing.cpu_count是一个很好的数字),然后使用pool.join等待它们全部完成.

import multiprocessing
import subprocess
import shlex

from multiprocessing.pool import ThreadPool

def call_proc(cmd):
    """ This runs in a separate thread. """
    #subprocess.call(shlex.split(cmd))  # This will block until cmd finishes
    p = subprocess.Popen(shlex.split(cmd), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    out, err = p.communicate()
    return (out, err)


pool = ThreadPool(multiprocessing.cpu_count())
results = []
for i in range(1,20):
    arguments += str(i) + "_image.jpg "
    results.append(pool.apply_async(call_proc, ("./combine" + arguments,)))

# Close the pool and wait for each running task to complete
pool.close()
pool.join()
for result in results:
    out, err = result.get()
    print("out: {} err: {}".format(out, err))
subprocess.call("./merge_resized_images")
Run Code Online (Sandbox Code Playgroud)

每个线程都会在等待子进程完成时释放GIL,因此它们都将并行运行.