从不同进程导入 Tensorflow 2.0 gpu

Edo*_*doG 5 python-3.x python-multiprocessing tensorflow tensorflow2.0

我正在开发一个项目,其中我有一个 python 模块,它实现了迭代过程,并且一些计算是由 GPU 使用 TensorFlow 2.0 执行的。该模块在单个进程中独立使用时可以正常工作。

由于我必须使用不同的参数执行多次运行,所以我想并行化调用,但是当我从不同的进程调用模块(导入张量流)时,我得到了 的CUDA_ERROR_OUT_OF_MEMORY无限循环CUDA_ERROR_NOT_INITIALIZED,因此生成的进程永远挂起。

当然,我尝试限制 GPU 内存,如果我从不同的解释器运行两个不同的 python 脚本,它可以正常工作,但在我的情况下似乎不起作用。

特别是,如果我使用

import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
    try:
       # Currently, memory growth needs to be the same across GPUs
        for gpu in gpus:
            tf.config.experimental.set_memory_growth(gpu, True)
            logical_gpus = tf.config.experimental.list_logical_devices('GPU')
            print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
    except RuntimeError as e:
        # Memory growth must be set before GPUs have been initialized
        print(e)
Run Code Online (Sandbox Code Playgroud)

我得到 的无限循环CUDA_ERROR_NOT_INITIALIZED,而如果我使用:

physical_devices = tf.config.experimental.list_physical_devices('GPU')
if len(physical_devices) > 0:
    tf.config.experimental.set_memory_growth(physical_devices[0], True)
else:
    print("No GPU found, model running on CPU")
Run Code Online (Sandbox Code Playgroud)

该进程也挂起,但每个生成的进程都会出现错误。

通过读取 Tensorflow 控制台输出,第一个生成的进程似乎在 GPU 上分配内存,但它和其他抱怨内存耗尽的进程一样挂起。奇怪的是,在 nvidia-smi 中 GPU 内存似乎根本没有耗尽。

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.48                 Driver Version: 410.48                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  TITAN V             Off  | 00000000:03:00.0  On |                  N/A |
| 29%   42C    P8    28W / 250W |    755MiB / 12035MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+


Run Code Online (Sandbox Code Playgroud)

我设法编写了该问题的最小可重现示例:

文件“tf_module.py”

import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
    try:
       # Currently, memory growth needs to be the same across GPUs
        for gpu in gpus:
            tf.config.experimental.set_memory_growth(gpu, True)
            logical_gpus = tf.config.experimental.list_logical_devices('GPU')
            print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
    except RuntimeError as e:
        # Memory growth must be set before GPUs have been initialized
        print(e)
else:
    print("Running on CPU")

def run(x, y):
    return tf.add(x, y).numpy()
Run Code Online (Sandbox Code Playgroud)

文件“run.py”

from multiprocessing import Pool
import tf_module as experiment
def run_exp(params):
    a, b = params
    return experiment.run(a, b)

pool = Pool(2)
params = [(a, b) for a in range(3) for b in range(3)]

results = pool.map(run_exp, params)
Run Code Online (Sandbox Code Playgroud)

将 TF 计算移出模块是不可行的,因为它是复杂管道的一部分,其中还涉及 numpy,因此我需要并行化这段代码。

我错过了什么吗?

提前致谢