多处理与NumPy不兼容

use*_*966 17 python numpy multiprocessing

我试图使用多处理运行一个简单的测试.测试工作正常,直到我导入numpy(即使它没有在程序中使用).这是代码:

from multiprocessing import Pool
import time
import numpy as np #this is the problematic line


def CostlyFunc(N):
    """"""
    tstart = time.time()
    x = 0
    for i in xrange(N):
        for j in xrange(N):
            if i % 2: x += 2
            else: x -= 2       
    print "CostlyFunc : elapsed time %f s" % (time.time() - tstart)
    return x

#serial application
ResultList0 = []
StartTime = time.time()
for i in xrange(3):
    ResultList0.append(CostlyFunc(5000))
print "Elapsed time (serial) : ", time.time() - StartTime


#multiprocessing application
StartTime = time.time()
pool = Pool()
asyncResult = pool.map_async(CostlyFunc, [5000, 5000, 5000])
ResultList1 = asyncResult.get()
print "Elapsed time (multiporcessing) : ", time.time() - StartTime
Run Code Online (Sandbox Code Playgroud)

如果我不导入numpy,结果是:

CostlyFunc : elapsed time 2.866265 s
CostlyFunc : elapsed time 2.793213 s
CostlyFunc : elapsed time 2.794936 s
Elapsed time (serial) :  8.45455098152
CostlyFunc : elapsed time 2.889815 s
CostlyFunc : elapsed time 2.891556 s
CostlyFunc : elapsed time 2.898898 s
Elapsed time (multiporcessing) :  2.91595196724
Run Code Online (Sandbox Code Playgroud)

总耗用时间与1个过程所需的时间相似,这意味着计算已经并行化.如果我导入numpy结果变为:

CostlyFunc : elapsed time 2.877116 s
CostlyFunc : elapsed time 2.866778 s
CostlyFunc : elapsed time 2.860894 s
Elapsed time (serial) :  8.60492110252
CostlyFunc : elapsed time 8.450145 s
CostlyFunc : elapsed time 8.473006 s
CostlyFunc : elapsed time 8.506402 s
Elapsed time (multiporcessing) :  8.55398178101
Run Code Online (Sandbox Code Playgroud)

串行和多处理方法的总耗用时间相同,因为只使用了一个核心.很明显,问题来自于numpy.我的多处理版本和NumPy之间是否存在不兼容性?

我目前在linux上使用Python2.7,NumPy 1.6.2和多处理0.70a1

小智 4

(第一篇文章,如果表述不当或对齐不当,敬请谅解)

您可以通过将 MKL_NUM_THREADS 设置为 1 来停止 Numpy 使用多线程

在 debian 下我使用:

export MKL_NUM_THREADS=1
Run Code Online (Sandbox Code Playgroud)

来源自相关 stackoverflow 帖子:Python: How do you stop numpy from multithreading?

结果:

user@pc:~/tmp$ python multi.py
CostlyFunc : elapsed time 3.847009 s
CostlyFunc : elapsed time 3.253226 s
CostlyFunc : elapsed time 3.415734 s
Elapsed time (serial) :  10.5163660049
CostlyFunc : elapsed time 4.218424 s
CostlyFunc : elapsed time 5.252429 s
CostlyFunc : elapsed time 4.862513 s
Elapsed time (multiporcessing) :  9.11713695526

user@pc:~/tmp$ export MKL_NUM_THREADS=1

user@pc:~/tmp$ python multi.py
CostlyFunc : elapsed time 3.014677 s
CostlyFunc : elapsed time 3.102548 s
CostlyFunc : elapsed time 3.060915 s
Elapsed time (serial) :  9.17840886116
CostlyFunc : elapsed time 3.720322 s
CostlyFunc : elapsed time 3.950583 s
CostlyFunc : elapsed time 3.656165 s
Elapsed time (multiporcessing) :  7.399310112
Run Code Online (Sandbox Code Playgroud)

我不确定这是否有帮助,因为我想最终您希望 numpy 并行运行,也许尝试调整 numpy 到您的机器的线程数。