Python管理器在多处理中的字典

Bru*_*uce 21 python multiprocessing

这是一个简单的多处理代码:

from multiprocessing import Process, Manager

manager = Manager()
d = manager.dict()

def f():
    d[1].append(4)
    print d

if __name__ == '__main__':
    d[1] = []
    p = Process(target=f)
    p.start()
    p.join()
Run Code Online (Sandbox Code Playgroud)

我得到的输出是:

{1: []}
Run Code Online (Sandbox Code Playgroud)

为什么我不把它{1: [4]}作为输出?

aka*_*Rem 28

这是你写的:

# from here code executes in main process and all child processes
# every process makes all these imports
from multiprocessing import Process, Manager

# every process creates own 'manager' and 'd'
manager = Manager() 
# BTW, Manager is also child process, and 
# in its initialization it creates new Manager, and new Manager
# creates new and new and new
# Did you checked how many python processes were in your system? - a lot!
d = manager.dict()

def f():
    # 'd' - is that 'd', that is defined in globals in this, current process 
    d[1].append(4)
    print d

if __name__ == '__main__':
# from here code executes ONLY in main process 
    d[1] = []
    p = Process(target=f)
    p.start()
    p.join()
Run Code Online (Sandbox Code Playgroud)

这是你应该写的:

from multiprocessing import Process, Manager
def f(d):
    d[1] = d[1] + [4]
    print d

if __name__ == '__main__':
    manager = Manager() # create only 1 mgr
    d = manager.dict() # create only 1 dict
    d[1] = []
    p = Process(target=f,args=(d,)) # say to 'f', in which 'd' it should append
    p.start()
    p.join()
Run Code Online (Sandbox Code Playgroud)

  • @akaRem,你救了我的命,这应该在某个地方非常清楚地说明 Manager() 应该是整个应用程序的单个全局对象 (2认同)

Yoe*_*oel 12

Python的官方文档中d[1]说明了未附加新项目的原因:

对dict和列表代理中的可变值或项的修改不会通过管理器传播,因为代理无法知道何时修改其值或项.要修改此类项,可以将修改后的对象重新分配给容器代理.

因此,这实际上是发生了什么:

from multiprocessing import Process, Manager

manager = Manager()
d = manager.dict()

def f():
    # invoke d.__getitem__(), returning a local copy of the empty list assigned by the main process,
    # (consider that a KeyError exception wasn't raised, so a list was definitely returned),
    # and append 4 to it, however this change is not propagated through the manager,
    # as it's performed on an ordinary list with which the manager has no interaction
    d[1].append(4)
    # convert d to string via d.__str__() (see https://docs.python.org/2/reference/datamodel.html#object.__str__),
    # returning the "remote" string representation of the object (see https://docs.python.org/2/library/multiprocessing.html#multiprocessing.managers.SyncManager.list),
    # to which the change above was not propagated
    print d

if __name__ == '__main__':
    # invoke d.__setitem__(), propagating this assignment (mapping 1 to an empty list) through the manager
    d[1] = []
    p = Process(target=f)
    p.start()
    p.join()
Run Code Online (Sandbox Code Playgroud)

d[1]在更新后重新分配新列表,或者甚至再次使用相同的列表,会触发管理器传播更改:

from multiprocessing import Process, Manager

manager = Manager()
d = manager.dict()

def f():
    # perform the exact same steps, as explained in the comments to the previous code snippet above,
    # but in addition, invoke d.__setitem__() with the changed item in order to propagate the change
    l = d[1]
    l.append(4)
    d[1] = l
    print d

if __name__ == '__main__':
    d[1] = []
    p = Process(target=f)
    p.start()
    p.join()
Run Code Online (Sandbox Code Playgroud)

这条线d[1] += [4]也会起作用.


另外,由于Python的3.6,每本变更以下这个问题,它也可以使用嵌套的代理对象可以自动传播它们的含有代理对象进行任何更改.因此,在更换线d[1] = []d[1] = manager.list()将解决此问题,以及:

from multiprocessing import Process, Manager

manager = Manager()
d = manager.dict()

def f():
    d[1].append(4)
    # the __str__() method of a dict object invokes __repr__() on each of its items,
    # so explicitly invoking __str__() is required in order to print the actual list items
    print({k: str(v) for k, v in d.items()}

if __name__ == '__main__':
    d[1] = manager.list()
    p = Process(target=f)
    p.start()
    p.join()
Run Code Online (Sandbox Code Playgroud)

不幸的是,这个错误修复程序没有移植到Python 2.7(从Python 2.7.13开始).


注意(在Windows操作系统下运行):

尽管所描述的行为也适用于Windows操作系统,但 由于不同的进程创建机制(依赖于API而不是系统调用,而不支持),因此附加的代码片段在Windows下执行时会失败.CreateProcess()fork()

每当通过多处理模块创建新进程时,Windows都会创建一个新的Python解释器进程,该进程导入主模块,具有潜在的危险副作用.为了避免这个问题,建议采用以下编程指南:

确保新的Python解释器可以安全地导入主模块,而不会导致意外的副作用(例如启动新进程).

因此,在Windows下执行附加的代码片段会尝试根据该manager = Manager()行创建无限数量的进程.这可以通过在子句中创建ManagerManager.dict对象if __name__ == '__main__'并将Manager.dict对象作为参数传递来轻松修复f(),如本答案中所做.

有关该问题的更多详细信息,请参阅此答案.


Car*_*res 11

我认为这是管理员代理呼叫中的一个错误.您可以规避避免共享列表的调用方法,例如:

from multiprocessing import Process, Manager

manager = Manager()
d = manager.dict()

def f():
    # get the shared list
    shared_list = d[1]

    shared_list.append(4)

    # forces the shared list to 
    # be serialized back to manager
    d[1] = shared_list

    print d

if __name__ == '__main__':
    d[1] = []
    p = Process(target=f)
    p.start()
    p.join()

    print d
Run Code Online (Sandbox Code Playgroud)