joblib中的batch_size和pre_dispatch到底意味着什么

Ibr*_*iev 7 python multithreading multiprocessing python-3.x joblib

从这里的文件https://pythonhosted.org/joblib/parallel.html#parallel-reference-documentation 目前还不清楚我究竟batch_sizepre_dispatch手段.

让我们考虑使用'multiprocessing'后端,2个作业(2个进程)的情况,我们有10个任务要计算.

我认为:

batch_size- 一次控制腌制任务的数量,所以如果你设置batch_size = 5- joblib会腌制并立即向每个进程发送5个任务,并且在到达那里后,它们将按顺序一个接一个地解决.使用batch_size=1joblib将一次挑选并发送一个任务,当且仅当该进程完成了上一个任务.

显示我的意思:

def solve_one_task(task):
    # Solves one task at a time
    ....
    return result

def solve_list(list_of_tasks):
    # Solves batch of tasks sequentially
    return [solve_one_task(task) for task in list_of_tasks]
Run Code Online (Sandbox Code Playgroud)

所以这段代码:

Parallel(n_jobs=2, backend = 'multiprocessing', batch_size=5)(
        delayed(solve_one_task)(task) for task in tasks)
Run Code Online (Sandbox Code Playgroud)

等于此代码(在性能方面):

slices = [(0,5)(5,10)]
Parallel(n_jobs=2, backend = 'multiprocessing', batch_size=1)(
        delayed(solve_list)(tasks[slice[0]:slice[1]]) for slice in slices)
Run Code Online (Sandbox Code Playgroud)

我对吗?那pre_dispatch意味着什么呢?

Ibr*_*iev 7

事实证明,我是对的,并且两段代码在性能意义上非常相似,所以batch_size正如我在Question中预期的那样工作.pre_dispatch(作为文档状态)控制任务队列中实例化任务的数量.

from sklearn.externals.joblib import Parallel, delayed
from time import sleep, time

def solve_one_task(task):
    # Solves one task at a time
    print("%d. Task #%d is being solved"%(time(), task))
    sleep(5)
    return task

def task_gen(max_task):
    current_task = 0
    while current_task < max_task:
        print("%d. Task #%d was dispatched"%(time(), current_task))
        yield current_task
        current_task += 1

Parallel(n_jobs=2, backend = 'multiprocessing', batch_size=1, pre_dispatch=3)(
        delayed(solve_one_task)(task) for task in task_gen(10))
Run Code Online (Sandbox Code Playgroud)

输出:

1450105367. Task #0 was dispatched
1450105367. Task #1 was dispatched
1450105367. Task #2 was dispatched
1450105367. Task #0 is being solved
1450105367. Task #1 is being solved
1450105372. Task #2 is being solved
1450105372. Task #3 was dispatched
1450105372. Task #4 was dispatched
1450105372. Task #3 is being solved
1450105377. Task #4 is being solved
1450105377. Task #5 was dispatched
1450105377. Task #5 is being solved
1450105377. Task #6 was dispatched
1450105382. Task #7 was dispatched
1450105382. Task #6 is being solved
1450105382. Task #7 is being solved
1450105382. Task #8 was dispatched
1450105387. Task #9 was dispatched
1450105387. Task #8 is being solved
1450105387. Task #9 is being solved
Out[1]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Run Code Online (Sandbox Code Playgroud)