使用multiprocessing.Manager.list而不是真实列表会使计算花费很长时间

Question

使用multiprocessing.Manager.list而不是真实列表会使计算花费很长时间

我想multiprocessing从这个例子开始尝试不同的使用方法:

$ cat multi_bad.py 
import multiprocessing as mp
from time import sleep
from random import randint

def f(l, t):
#   sleep(30)
    return sum(x < t for x in l)

if __name__ == '__main__':
    l = [randint(1, 1000) for _ in range(25000)]
    t = [randint(1, 1000) for _ in range(4)]
#   sleep(15)
    pool = mp.Pool(processes=4)
    result = pool.starmap_async(f, [(l, x) for x in t])
    print(result.get())

Run Code Online (Sandbox Code Playgroud)

这里l是一个列表,当生成4个进程时,它会被复制4次.为避免这种情况,文档页面提供了使用队列,共享数组或代理对象multiprocessing.Manager.对于最后一个,我改变了以下定义l:

$ diff multi_bad.py multi_good.py 
10c10,11
<     l = [randint(1, 1000) for _ in range(25000)]
---
>     man = mp.Manager()
>     l = man.list([randint(1, 1000) for _ in range(25000)])

Run Code Online (Sandbox Code Playgroud)

结果看起来仍然正确,但执行时间大幅增加,我认为我做错了:

$ time python multi_bad.py 
[17867, 11103, 2021, 17918]

real    0m0.247s
user    0m0.183s
sys 0m0.010s

$ time python multi_good.py 
[3609, 20277, 7799, 24262]

real    0m15.108s
user    0m28.092s
sys 0m6.320s

Run Code Online (Sandbox Code Playgroud)

文档确实说这种方式比共享数组慢,但这只是错误的.我也不确定如何对此进行分析以获得有关正在发生的事情的更多信息.我错过了什么吗？

PS使用共享阵列时,我的时间低于0.25秒.

PPS这是在Linux和Python 3.3上.

Answer 1

unu*_*tbu 9

编辑子进程时,Linux使用copy-on-writeos.fork.展示:

import multiprocessing as mp
import numpy as np
import logging
import os

logger = mp.log_to_stderr(logging.WARNING)

def free_memory():
    total = 0
    with open('/proc/meminfo', 'r') as f:
        for line in f:
            line = line.strip()
            if any(line.startswith(field) for field in ('MemFree', 'Buffers', 'Cached')):
                field, amount, unit = line.split()
                amount = int(amount)
                if unit != 'kB':
                    raise ValueError(
                        'Unknown unit {u!r} in /proc/meminfo'.format(u = unit))
                total += amount
    return total

def worker(i):
    x = data[i,:].sum()    # Exercise access to data
    logger.warn('Free memory: {m}'.format(m = free_memory()))

def main():
    procs = [mp.Process(target = worker, args = (i, )) for i in range(4)]
    for proc in procs:
        proc.start()
    for proc in procs:
        proc.join()

logger.warn('Initial free: {m}'.format(m = free_memory()))
N = 15000
data = np.ones((N,N))
logger.warn('After allocating data: {m}'.format(m = free_memory()))

if __name__ == '__main__':
    main()

Run Code Online (Sandbox Code Playgroud)

产生了

[WARNING/MainProcess] Initial free: 2522340
[WARNING/MainProcess] After allocating data: 763248
[WARNING/Process-1] Free memory: 760852
[WARNING/Process-2] Free memory: 757652
[WARNING/Process-3] Free memory: 757264
[WARNING/Process-4] Free memory: 756760

Run Code Online (Sandbox Code Playgroud)

这表明最初有大约2.5GB的可用内存.在分配15000x15000的float64s 数组后,有763248 KB空闲.这大概有意义,因为15000**2*8字节= 1.8GB,内存下降,2.5GB - 0.763248GB也大约是1.8GB.

现在,在生成每个进程后,再次报告可用内存为~750MB.可用内存没有明显减少,因此我得出结论,系统必须使用copy-on-write.

结论:如果你没有需要修改的数据,在全球层面定义它__main__模块给子进程间共享一个方便(至少在Linux上)内存友好的方式.

Answer 2

Bak*_*riu 5

This is to be expected because accessing a shared objects means having to pickle the request send it through some kind of signal/syscall unpickle the request perform it and return the result in the same way.

Basically you should try to avoid sharing memory as much as you can. This leads to more debuggable code(because you have much less concurrency) and the speed up is greater.

Shared memory should only be used if really needed(e.g. sharing gigabytes of data so that copying it would require too much RAM or if the processes should be able to interact through this shared memory).

On a side note, probably using the Manager is much slower than a shared Array because the Manager must be able to handle any PyObject * and thus has to pickle/unpickle etc, while the arrays can avoid much of this overhead.

From the multiprocessing's documentation:

Managers provide a way to create data which can be shared between different processes. A manager object controls a server process which manages shared objects. Other processes can access the shared objects by using proxies.

So using a Manager means to spawn a new process that is used just to handle the shared memory, that's probably why it takes much more time.

If you try to profile the speed of the proxy it its a lot slower than a non-shared list:

>>> import timeit
>>> import multiprocessing as mp
>>> man = mp.Manager()
>>> L = man.list(range(25000))
>>> timeit.timeit('L[0]', 'from __main__ import L')
50.490395069122314
>>> L = list(range(25000))
>>> timeit.timeit('L[0]', 'from __main__ import L')
0.03588080406188965
>>> 50.490395069122314 / _
1407.1701119638526

Run Code Online (Sandbox Code Playgroud)

While an Array is not so much slower:

>>> L = mp.Array('i', range(25000))
>>> timeit.timeit('L[0]', 'from __main__ import L')
0.6133401393890381
>>> 0.6133401393890381 / 0.03588080406188965
17.09382371507359

Run Code Online (Sandbox Code Playgroud)

Since the very elementary operation are slow and don't think there's much hope to speed them up, this means that if you have to share a big list of data and want fast access to it then you ought to use an Array.

可能会使速度加快一点的事情是一次访问一个以上的元素（例如，获取切片而不是单个元素），但是取决于您要执行的操作，这可能或不可能。

归档时间：	13 年前
查看次数：	13074 次
最近记录：	10 年，10 月前