创建和填充巨大的numpy 2D阵列的最快方法?

sol*_*sol 6 python numpy matrix multiprocessing multidimensional-array

我必须创建并填充巨大的(例如 96 Go,72000行*72000列)数组,每个数组中都有来自数学公式的浮点数.该数组将在之后计算.

import itertools, operator, time, copy, os, sys
import numpy 
from multiprocessing import Pool


def f2(x):  # more complex mathematical formulas that change according to values in *i* and *x*
    temp=[]
    for i in combine:
        temp.append(0.2*x[1]*i[1]/64.23)
    return temp

def combinations_with_replacement_counts(n, r):  #provide all combinations of r balls in n boxes
   size = n + r - 1
   for indices in itertools.combinations(range(size), n-1):
       starts = [0] + [index+1 for index in indices]
       stops = indices + (size,)
       yield tuple(map(operator.sub, stops, starts))

global combine
combine = list(combinations_with_replacement_counts(3, 60))  #here putted 60 but need 350 instead
print len(combine)
if __name__ == '__main__':
    t1=time.time()
    pool = Pool()              # start worker processes
    results = [pool.apply_async(f2, (x,)) for x in combine]
    roots = [r.get() for r in results]
    print roots [0:3]
    pool.close()
    pool.join()
    print time.time()-t1
Run Code Online (Sandbox Code Playgroud)
  • 什么是创建和填充如此巨大的numpy阵列的最快方法?填充列表然后聚合然后转换为numpy数组?
  • 我们可以并行计算,知道2d阵列的情况/列/行是独立的,以加速数组的填充吗?使用多处理优化此类计算的线索/路径?

shx*_*hx2 0

您可以创建一个numpy.memmap具有所需形状的空数组,然后用于multiprocessing.Pool填充其值。正确执行还会使池中每个进程的内存占用量保持相对较小。