未找到 Libdevice

Dav*_*186 8 tensorflow

我正在尝试使用我的 GPU 运行张量流,并按照此链接中的说明进行操作。运行步骤 6 中的命令后,我得到了正确的输出。

然后,当我尝试运行我尝试构建的实际模型时,出现以下错误。

2023-01-06 18:39:14.692537: W tensorflow/compiler/xla/service/gpu/nvptx_helper.cc:56] Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice. This may result in compilation or runtime failures, if the program we try to run uses routines from libdevice.
Searched for CUDA in the following directories:
  ./cuda_sdk_lib
  /usr/local/cuda-11.2
  /usr/local/cuda
  .
You can choose the search directory by setting xla_gpu_cuda_data_dir in HloModule's DebugOptions.  For most apps, setting the environment variable XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda will work.
2023-01-06 18:39:14.693094: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:326] libdevice is required by this HLO module but was not found at ./libdevice.10.bc
2023-01-06 18:39:14.693196: I tensorflow/compiler/jit/xla_compilation_cache.cc:477] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.
2023-01-06 18:39:14.693275: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: libdevice not found at ./libdevice.10.bc
2023-01-06 18:39:14.704458: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:326] libdevice is required by this HLO module but was not found at ./libdevice.10.bc
2023-01-06 18:39:14.704603: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: libdevice not found at ./libdevice.10.bc
Traceback (most recent call last):
  File "/home/jerry/Woodburn/Woodburn_Model/model/main/Model_Main.py", line 42, in <module>
    main(sys.argv[1:])
  File "/home/jerry/Woodburn/Woodburn_Model/model/main/Model_Main.py", line 27, in main
    model.train()
  File "/home/jerry/Woodburn/Woodburn_Model/model/main/Model_V5.py", line 99, in train
    history = self.model.fit(x, y, batch_size = batchSize, epochs = epochs)
  File "/home/jerry/miniconda3/envs/tensorflow_gpu/lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/jerry/miniconda3/envs/tensorflow_gpu/lib/python3.9/site-packages/tensorflow/python/eager/execute.py", line 52, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InternalError: Graph execution error:

Detected at node 'StatefulPartitionedCall_10' defined at (most recent call last):
    File "/home/jerry/Woodburn/Woodburn_Model/model/main/Model_Main.py", line 42, in <module>
      main(sys.argv[1:])
    File "/home/jerry/Woodburn/Woodburn_Model/model/main/Model_Main.py", line 27, in main
      model.train()
    File "/home/jerry/Woodburn/Woodburn_Model/model/main/Model_V5.py", line 99, in train
      history = self.model.fit(x, y, batch_size = batchSize, epochs = epochs)
    File "/home/jerry/miniconda3/envs/tensorflow_gpu/lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler
      return fn(*args, **kwargs)
    File "/home/jerry/miniconda3/envs/tensorflow_gpu/lib/python3.9/site-packages/keras/engine/training.py", line 1650, in fit
      tmp_logs = self.train_function(iterator)
    File "/home/jerry/miniconda3/envs/tensorflow_gpu/lib/python3.9/site-packages/keras/engine/training.py", line 1249, in train_function
      return step_function(self, iterator)
    File "/home/jerry/miniconda3/envs/tensorflow_gpu/lib/python3.9/site-packages/keras/engine/training.py", line 1233, in step_function
      outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/home/jerry/miniconda3/envs/tensorflow_gpu/lib/python3.9/site-packages/keras/engine/training.py", line 1222, in run_step
      outputs = model.train_step(data)
    File "/home/jerry/miniconda3/envs/tensorflow_gpu/lib/python3.9/site-packages/keras/engine/training.py", line 1027, in train_step
      self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
    File "/home/jerry/miniconda3/envs/tensorflow_gpu/lib/python3.9/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 527, in minimize
      self.apply_gradients(grads_and_vars)
    File "/home/jerry/miniconda3/envs/tensorflow_gpu/lib/python3.9/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1140, in apply_gradients
      return super().apply_gradients(grads_and_vars, name=name)
    File "/home/jerry/miniconda3/envs/tensorflow_gpu/lib/python3.9/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 634, in apply_gradients
      iteration = self._internal_apply_gradients(grads_and_vars)
    File "/home/jerry/miniconda3/envs/tensorflow_gpu/lib/python3.9/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1166, in _internal_apply_gradients
      return tf.__internal__.distribute.interim.maybe_merge_call(
    File "/home/jerry/miniconda3/envs/tensorflow_gpu/lib/python3.9/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1216, in _distributed_apply_gradients_fn
      distribution.extended.update(
    File "/home/jerry/miniconda3/envs/tensorflow_gpu/lib/python3.9/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1211, in apply_grad_to_update_var
      return self._update_step_xla(grad, var, id(self._var_key(var)))
Node: 'StatefulPartitionedCall_10'
libdevice not found at ./libdevice.10.bc
         [[{{node StatefulPartitionedCall_10}}]] [Op:__inference_train_function_8591]
Run Code Online (Sandbox Code Playgroud)

经过一番研究,发现相关错误如下:

2023-01-06 18:39:14.692537: W tensorflow/compiler/xla/service/gpu/nvptx_helper.cc:56] Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice. This may result in compilation or runtime failures, if the program we try to run uses routines from libdevice.
Searched for CUDA in the following directories:
  ./cuda_sdk_lib
  /usr/local/cuda-11.2
  /usr/local/cuda
  .
You can choose the search directory by setting xla_gpu_cuda_data_dir in HloModule's DebugOptions.  For most apps, setting the environment variable XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda will work.
2023-01-06 18:39:14.693094: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:326] libdevice is required by this HLO module but was not found at ./libdevice.10.bc
2023-01-06 18:39:14.693196: I tensorflow/compiler/jit/xla_compilation_cache.cc:477] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.
2023-01-06 18:39:14.693275: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: libdevice not found at ./libdevice.10.bc
2023-01-06 18:39:14.704458: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:326] libdevice is required by this HLO module but was not found at ./libdevice.10.bc
2023-01-06 18:39:14.704603: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:446 : INTERNAL: libdevice not found at ./libdevice.10.bc
Traceback (most recent call last):
Run Code Online (Sandbox Code Playgroud)

对于上下文,它在 Ubuntu 20.04 和 python 3.9 中运行。关于如何修复有什么想法吗?

小智 1

如果您Cudatoolkit通过安装,则可以通过设置withConda来解决问题。您可以在每次激活环境时设置 env 变量,如下所示:https ://conda.io/docs/user-guide/tasks/manage-environments.html#macos-and-linuxXLA_FLAGSexport XLA_FLAGS=--xla_gpu_cuda_data_dir=$CONDA_PREFIX