如果我不触摸它，为什么 multiprocessing 会复制我的数据？

Question

如果我不触摸它，为什么 multiprocessing 会复制我的数据？

Mik*_*ail 5 python multithreading multiprocessing python-3.x

我正在追踪一个内存不足的错误，并惊恐地发现 python 的多处理似乎复制了大数组，即使我无意使用它们。

为什么python（在Linux上）这样做，我认为写时复制可以保护我免受任何额外的复制？我想，每当我引用对象时，都会调用某种陷阱，然后才进行复制。

对于任意数据类型（例如 30 GB 的自定义字典使用Monitor? 有什么方法可以构建 Python 使其没有这些废话吗？

import numpy as np
import psutil
from multiprocessing import Process
mem=psutil.virtual_memory()
large_amount=int(0.75*mem.available)

def florp():
    print("florp")

def bigdata():
    return np.ones(large_amount,dtype=np.int8)

if __name__=='__main__':
    foo=bigdata()#Allocated 0.75 of the ram, no problems
    p=Process(target=florp)
    p.start()#Out of memory because bigdata is copied? 
    print("Wow")
    p.join()

Run Code Online (Sandbox Code Playgroud)

跑步：

[ebuild   R    ] dev-lang/python-3.4.1:3.4::gentoo  USE="gdbm ipv6 ncurses readline ssl threads xml -build -examples -hardened -sqlite -tk -wininst" 0 KiB

Run Code Online (Sandbox Code Playgroud)

Answer 1

Mik*_*ail 1

问题是，默认情况下，Linux 检查最坏情况的内存使用情况，这确实可能超出内存容量。即使 python 语言不公开变量也是如此。您需要在系统范围内关闭“过度使用”，以实现预期的 COW 行为。

sysctl `vm.overcommit_memory=2'

Run Code Online (Sandbox Code Playgroud)

请参阅https://www.kernel.org/doc/Documentation/vm/overcommit-accounting

归档时间：	10 年，4 月前
查看次数：	1659 次
最近记录：	10 年，4 月前