使用 Python 多处理共享对象数组

Question

使用 Python 多处理共享对象数组

119*_*631 9 python shared-memory multiprocessing python-3.8

对于这个问题，我参考了Python 文档中讨论“将SharedMemory类与NumPy数组一起使用，numpy.ndarray从两个不同的 Python shell访问相同的数组”中的示例。

我想实现的一个主要变化是操纵类对象的数组，而不是我在下面演示的整数值。

import numpy as np
from multiprocessing import shared_memory    

# a simplistic class example
class A(): 
    def __init__(self, x): 
        self.x = x

# numpy array of class objects 
a = np.array([A(1), A(2), A(3)])       

# create a shared memory instance
shm = shared_memory.SharedMemory(create=True, size=a.nbytes, name='psm_test0')

# numpy array backed by shared memory
b = np.ndarray(a.shape, dtype=a.dtype, buffer=shm.buf)                                    

# copy the original data into shared memory
b[:] = a[:]                                  

print(b)                                            

# array([<__main__.Foo object at 0x7fac56cd1190>,
#       <__main__.Foo object at 0x7fac56cd1970>,
#       <__main__.Foo object at 0x7fac56cd19a0>], dtype=object)

Run Code Online (Sandbox Code Playgroud)

现在，在不同的 shell 中，我们附加到共享内存空间并尝试操作数组的内容。

import numpy as np
from multiprocessing import shared_memory

# attach to the existing shared space
existing_shm = shared_memory.SharedMemory(name='psm_test0')

c = np.ndarray((3,), dtype=object, buffer=existing_shm.buf)

Run Code Online (Sandbox Code Playgroud)

即使在我们能够操作之前c，打印它也会导致分段错误。事实上，我不能指望观察尚未写入该模块的行为，所以我的问题是什么可以做些什么来工作对象的可共享的阵列？

我目前正在酸洗列表，但受保护的读/写会增加一些开销。我也试过使用Namespace，这很慢，因为不允许索引写入。另一个想法可能是在 a 中使用共享 Ctypes 结构，ShareableList但我不知道从哪里开始。

另外也有设计方面：它似乎有一个开放的bug中shared_memory可能影响我执行，其中我有几个过程对阵列的不同元素的工作。

是否有一种更具可扩展性的方式在多个进程之间共享大量对象，以便在任何给定时间所有正在运行的进程都与列表中的唯一对象/元素进行交互？

更新：在这一点上，我也将接受部分答案，这些答案谈论这是否可以用 Python 实现。

Answer 1

Ale*_*xNe 3

因此，我做了一些研究（多处理中的共享内存对象）并提出了一些想法：

传递 numpy 字节数组

序列化对象，然后将它们作为字节字符串保存到 numpy 数组中。这里有问题的是

需要将数据类型从的创建者传递'psm_test0'到的任何使用者'psm_test0'。不过，这可以通过另一个共享内存来完成。
pickle本质上unpickle就像deepcopy，即它实际上复制了底层数据。

“主”进程的代码如下：

import pickle
from multiprocessing import shared_memory
import numpy as np


# a simplistic class example
class A():
    def __init__(self, x):
        self.x = x

    def pickle(self):
        return pickle.dumps(self)

    @classmethod
    def unpickle(self, bts):
        return pickle.loads(bts)


if __name__ == '__main__':
    # Test pickling procedure
    a = A(1)
    print(A.unpickle(a.pickle()).x)
    # >>> 1

    # numpy array of byte strings
    a_arr = np.array([A(1).pickle(), A(2).pickle(), A('This is a really long test string which should exceed 42 bytes').pickle()])

    # create a shared memory instance
    shm = shared_memory.SharedMemory(
        create=True,
        size=a_arr.nbytes,
        name='psm_test0'
    )

    # numpy array backed by shared memory
    b_arr = np.ndarray(a_arr.shape, dtype=a_arr.dtype, buffer=shm.buf)

    # copy the original data into shared memory
    b_arr[:] = a_arr[:]

    print(b_arr.dtype)
    # 'S105'

Run Code Online (Sandbox Code Playgroud)

并为消费者

import numpy as np
from multiprocessing import shared_memory
from test import A

# attach to the existing shared space
existing_shm = shared_memory.SharedMemory(name='psm_test0')

c = np.ndarray((3,), dtype='S105', buffer=existing_shm.buf)

# Test data transfer
arr = [a.x for a in list(map(A.unpickle, c))]
print(arr)
# [1, 2, ...]

Run Code Online (Sandbox Code Playgroud)

我想说你有几种前进的方法：

保持简单的数据类型。
使用 C api 实现一些东西，但我不能真正帮助你。
使用铁锈
使用经理。您可能会失去一些性能（尽管我希望看到真正的基准），但是您可以获得一个相对安全且简单的共享对象接口。
使用Redis，它也有 Python 绑定......

归档时间：	5 年，4 月前
查看次数：	993 次
最近记录：	5 年，4 月前