Phy*_*ade 4 python joblib python-multiprocessing
当在 Joblib 中使用处理全局变量的函数时,在 Linux 上无需任何副本即可从该函数访问全局变量。
我们可以在以下脚本中对此进行测试:
import joblib
import numpy as np
print("Initializing global")
# Let's create a global that is big, so it takes time to create it
my_global = np.random.uniform(0,100, size=(10**4, 10**4))
print("done")
# A simple function working on the global variable
def fun_with_global():
return id(my_global)
print("starting // loop")
joblib.Parallel(n_jobs=3, backend="multiprocessing", verbose=100)((joblib.delayed(fun_with_global)() for i in range(1000)))
joblib.Parallel(n_jobs=3, backend="loky", verbose=100)((joblib.delayed(fun_with_global)() for i in range(1000)))
# We get that the two last parallel calls execute almost instantly, even for 1000 jobs.
# When we instead return id(my_global.copy()) in fun_with_global, here we see the copy operation is lengthy.
Run Code Online (Sandbox Code Playgroud)
事实上,对Parallel全局变量的调用几乎是即时的,这意味着没有对全局变量进行酸洗/取消酸洗。
这种行为实际上取决于后端:
multiprocessing后端来看,这是完全合乎逻辑的,因为多处理工作进程分叉了原始进程,这意味着全局变量my_global已经存在于工作进程内存中而不需要付出任何努力。loky中指出loky 工作人员 fork/exec,这意味着他们无法轻松访问此全局变量。那么,Loky 如何在不创建副本或分叉的情况下从父进程访问全局变量呢?
编辑:下面的示例仅适用于基于 numpy 数组的全局变量。对于另一个变量,有不同的行为:
import joblib
import numpy as np
import pandas as pd
import time
print("Initializing global")
# This time, let's create a big variable, that is not based on np arrays
with open("/dev/urandom", "rb") as fd:
my_global = fd.read(10**9)
print("done")
def fun_with_global():
return id(my_global)
print("starting // loop")
joblib.Parallel(n_jobs=3, backend="multiprocessing", verbose=100)((joblib.delayed(fun_with_global)() for i in range(1000)))
joblib.Parallel(n_jobs=3, backend="loky", verbose=100)((joblib.delayed(fun_with_global)() for i in range(1000)))
# Here the multiprocessing backend still executes instantly, but the Loky backends is slow
Run Code Online (Sandbox Code Playgroud)