相关疑难解决方法(0)

在共享内存中使用numpy数组进行多处理

我想在共享内存中使用numpy数组与多处理模块一起使用.困难是使用它像一个numpy数组,而不仅仅是一个ctypes数组.

from multiprocessing import Process, Array
import scipy

def f(a):
    a[0] = -a[0]

if __name__ == '__main__':
    # Create the array
    N = int(10)
    unshared_arr = scipy.rand(N)
    arr = Array('d', unshared_arr)
    print "Originally, the first two elements of arr = %s"%(arr[:2])

    # Create, start, and finish the child processes
    p = Process(target=f, args=(arr,))
    p.start()
    p.join()

    # Printing out the changed values
    print "Now, the first two elements of arr = %s"%arr[:2]

Run Code Online (Sandbox Code Playgroud)

这会产生如下输出:

Originally, the first two elements of arr = …

Run Code Online (Sandbox Code Playgroud)

python shared numpy multiprocessing

Ian*_*ore

2018 01-11

95
推荐指数

6
解决办法

6万
查看次数

我有一个60GB的SciPy数组(矩阵)我必须在5个以上的multiprocessing Process对象之间共享.我已经看过numpy-sharedmem并在SciPy列表上阅读了这个讨论.似乎有是两个approaches-- numpy-sharedmem和使用multiprocessing.RawArray(),并映射NumPy的dtypes到ctype秒.现在,numpy-sharedmem似乎是要走的路,但我还没有看到一个很好的参考例子.我不需要任何类型的锁,因为数组(实际上是矩阵)将是只读的.现在,由于它的大小,我想避免副本.这听起来像是正确的方法是创建唯一的数组作为副本sharedmem数组,然后将它传递给Process对象？几个具体问题:

将sharedmem句柄实际传递给子的最佳方法是Process()什么？我是否需要一个队列来传递一个阵列？管道会更好吗？我可以将它作为参数传递给Process()子类的init(我假设它被腌制)吗？
在上面我讨论过的讨论中,有人提到numpy-sharedmem不是64位安全吗？我肯定使用一些不是32位可寻址的结构.
这种RawArray()方法是否存在权衡？更慢,更笨？
我是否需要numpy-sharedmem方法的任何ctype-to-dtype映射？
有没有人有一些OpenSource代码这样做的例子？我是一个非常亲力实践的人,如果没有任何好的例子,很难让它工作.

如果我可以提供任何其他信息以帮助其他人澄清这一点,请发表评论,我将添加.谢谢!

这需要在Ubuntu Linux和Maybe Mac OS上运行,但可移植性不是一个大问题.

python numpy shared-memory multiprocessing

Wil*_*ill

2017 02-11

79
推荐指数

4
解决办法

2万
查看次数

如何在python子进程之间传递大型numpy数组而不保存到磁盘？

有没有一种很好的方法可以在不使用磁盘的情况下在两个python子进程之间传递大量数据？这是我希望完成的动画示例:

import sys, subprocess, numpy

cmdString = """
import sys, numpy

done = False
while not done:
    cmd = raw_input()
    if cmd == 'done':
        done = True
    elif cmd == 'data':
        ##Fake data. In real life, get data from hardware.
        data = numpy.zeros(1000000, dtype=numpy.uint8)
        data.dump('data.pkl')
        sys.stdout.write('data.pkl' + '\\n')
        sys.stdout.flush()"""

proc = subprocess.Popen( #python vs. pythonw on Windows?
    [sys.executable, '-c %s'%cmdString],
    stdin=subprocess.PIPE,
    stdout=subprocess.PIPE,
    stderr=subprocess.PIPE)

for i in range(3):
    proc.stdin.write('data\n')
    print proc.stdout.readline().rstrip()
    a = numpy.load('data.pkl')
    print a.shape

proc.stdin.write('done\n')

Run Code Online (Sandbox Code Playgroud)

这将创建一个子进程,该子进程生成numpy数组并将数组保存到磁盘.然后父进程从磁盘加载数组.有用!

问题是,我们的硬件可以生成比磁盘可读/写快10倍的数据.有没有办法将数据从一个python进程传输到另一个纯内存中,甚至可能没有复制数据？我可以做一些像传递参考的东西吗？

我第一次尝试纯粹在内存中传输数据是非常糟糕的: