我正在尝试在集群内的 GPU 节点上运行 keras 代码。GPU 节点每个节点有 4 个 GPU。我确保 GPU 节点中的所有 4 个 GPU 可供我使用。我运行下面的代码让 TensorFlow 使用 GPU:
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
try:
for gpu in gpus:
tf.config.experimental.set_memory_growth(gpu, True)
logical_gpus = tf.config.list_logical_devices('GPU')
print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
except RuntimeError as e:
print(e)
Run Code Online (Sandbox Code Playgroud)
输出中列出了 4 个可用的 GPU。但是,我在运行代码时遇到以下错误:
Traceback (most recent call last):
File "/BayesOptimization.py", line 20, in <module>
logical_gpus = tf.config.experimental.list_logical_devices('GPU')
File "/.conda/envs/thesis/lib/python3.9/site-packages/tensorflow/python/framework/config.py", line 439, in list_logical_devices
return context.context().list_logical_devices(device_type=device_type)
File "/.conda/envs/thesis/lib/python3.9/site-packages/tensorflow/python/eager/context.py", line 1368, in list_logical_devices
self.ensure_initialized()
File "/.conda/envs/thesis/lib/python3.9/site-packages/tensorflow/python/eager/context.py", line …Run Code Online (Sandbox Code Playgroud)