使用 Python 共享内存的分段错误

Ath*_*dom 6 python macos numpy shared-memory python-3.8

该函数store_in_shm将 numpy 数组写入共享内存,而第二个函数read_from_shm使用同一共享内存空间中的数据创建 numpy 数组并返回 numpy 数组。

但是,在 Python 3.8 中运行代码会出现以下分段错误:

zsh:分段错误 python foo.py

为什么从函数内部访问numpy数组没有问题read_from_shm,但在函数外部再次访问numpy数组时出现分段错误?

输出:

From read_from_shm(): [0 1 2 3 4 5 6 7 8 9]
zsh: segmentation fault  python foo.py
% /Users/athena/opt/anaconda3/envs/test/lib/python3.8/multiprocessing/resource_tracker.py:203: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
Run Code Online (Sandbox Code Playgroud)

foo.py

import numpy as np
from multiprocessing import shared_memory

def store_in_shm(data):
    shm = shared_memory.SharedMemory(name='foo', create=True, size=data.nbytes)
    shmData = np.ndarray(data.shape, dtype=data.dtype, buffer=shm.buf)
    shmData[:] = data[:]
    shm.close()
    return shm

def read_from_shm(shape, dtype):
    shm = shared_memory.SharedMemory(name='foo', create=False)
    shmData = np.ndarray(shape, dtype, buffer=shm.buf)
    print('From read_from_shm():', shmData)
    return shmData

if __name__ == '__main__':
    data = np.arange(10)
    shm = store_in_shm(data)
    shmData = read_from_shm(data.shape, data.dtype)
    print('From __main__:', shmData)    # no seg fault if we comment this line
    shm.unlink()
Run Code Online (Sandbox Code Playgroud)

Aar*_*ron 7

基本上,问题似乎是当函数返回时进行垃圾收集时,底层 mmap 文件(由shminside拥有read_from_shm)正在关闭。shm然后shmData引用它,这就是您得到段错误的地方(引用关闭的 mmap)这似乎是一个已知的错误,但可以通过保留对 的引用来解决shm

此外,所有SharedMemory实例都希望在不再需要时close()对其中一个实例进行'ed。unlink()如果您不调用shm.close()自己,它将在 GC 中调用,如前所述,在 Windows 上,如果它是当前“打开”的唯一一个,则共享内存文件将被删除。当您调用shm.close()inside时store_in_shm,您引入了操作系统依赖性,因为在 Windows 上数据将被删除,而 MacOS 和 Linux 上数据将保留直到unlink被调用。

最后,虽然这不会出现在您的代码中,但当前存在另一个问题,即从独立进程(而不是子进程)访问数据可能会过早删除底层 mmap。SharedMemory是一个非常新的库,希望所有的问题都能很快解决。

您可以重写给定的示例以保留对“第二个”的引用shm,并仅使用其中之一来unlink

import numpy as np
from multiprocessing import shared_memory

def store_in_shm(data):
    shm = shared_memory.SharedMemory(name='foo', create=True, size=data.nbytes)
    shmData = np.ndarray(data.shape, dtype=data.dtype, buffer=shm.buf)
    shmData[:] = data[:]
    #there must always be at least one `SharedMemory` object open for it to not
    #  be destroyed on Windows, so we won't `shm.close()` inside the function,
    #  but rather after we're done with everything.
    return shm

def read_from_shm(shape, dtype):
    shm = shared_memory.SharedMemory(name='foo', create=False)
    shmData = np.ndarray(shape, dtype, buffer=shm.buf)
    print('From read_from_shm():', shmData)
    return shm, shmData #we need to keep a reference of shm both so we don't
                        #  segfault on shmData and so we can `close()` it.

if __name__ == '__main__':
    data = np.arange(10)
    shm1 = store_in_shm(data)
    #This is where the *Windows* previously reclaimed the memory resulting in a 
    #  FileNotFoundError because the tempory mmap'ed file had been released.
    shm2, shmData = read_from_shm(data.shape, data.dtype)
    print('From __main__:', shmData)
    shm1.close() 
    shm2.close()
    #on windows "unlink" happens automatically here whether you call `unlink()` or not.
    shm2.unlink() #either shm1 or shm2
Run Code Online (Sandbox Code Playgroud)