ImportError:libcublas.so.10.0:无法打开共享库文件:没有此类文件或目录

Ami*_*ati 6 cuda tensorflow ubuntu-18.04

我已经在Ubuntu 18.04上安装了Cuda 10.1和cudnn,并且似乎已正确安装为nvcc和nvidia-smi类型,我得到了正确的响应:

    user:~$ nvcc -V
    nvcc: NVIDIA (R) Cuda compiler driver
    Copyright (c) 2005-2019 NVIDIA Corporation
    Built on Fri_Feb__8_19:08:17_PST_2019
    Cuda compilation tools, release 10.1, V10.1.105
    user:~$ nvidia-smi 
    Mon Mar 18 14:36:47 2019       
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 418.43       Driver Version: 418.43       CUDA Version: 10.1     |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |===============================+======================+======================|
    |   0  Quadro K5200        Off  | 00000000:03:00.0  On |                  Off |
    | 26%   39C    P8    14W / 150W |    225MiB /  8118MiB |      0%      Default |
    +-------------------------------+----------------------+----------------------+

    +-----------------------------------------------------------------------------+
    | Processes:                                                       GPU Memory |
    |  GPU       PID   Type   Process name                             Usage      |
    |=============================================================================|
    |    0      1538      G   /usr/lib/xorg/Xorg                            32MiB |
    |    0      1583      G   /usr/bin/gnome-shell                           5MiB |
    |    0      3008      G   /usr/lib/xorg/Xorg                           100MiB |
    |    0      3120      G   /usr/bin/gnome-shell                          82MiB |
    +-----------------------------------------------------------------------------+
Run Code Online (Sandbox Code Playgroud)

我已经使用以下方法安装了tensorflow: user:~$ sudo pip3 install --upgrade tensorflow-gpu

The directory '/home/amin/.cache/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
The directory '/home/amin/.cache/pip' or its parent directory is not owned by the current user and caching wheels has been disabled. check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
Requirement already up-to-date: tensorflow-gpu in /usr/local/lib/python3.6/dist-packages (1.13.1)
Requirement already satisfied, skipping upgrade: keras-applications>=1.0.6 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu) (1.0.7)
Requirement already satisfied, skipping upgrade: protobuf>=3.6.1 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu) (3.6.1)
Requirement already satisfied, skipping upgrade: wheel>=0.26 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu) (0.32.3)
Requirement already satisfied, skipping upgrade: absl-py>=0.1.6 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu) (0.7.0)
Requirement already satisfied, skipping upgrade: keras-preprocessing>=1.0.5 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu) (1.0.9)
Requirement already satisfied, skipping upgrade: gast>=0.2.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu) (0.2.2)
Requirement already satisfied, skipping upgrade: termcolor>=1.1.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu) (1.1.0)
Requirement already satisfied, skipping upgrade: grpcio>=1.8.6 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu) (1.18.0)
Requirement already satisfied, skipping upgrade: tensorflow-estimator<1.14.0rc0,>=1.13.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu) (1.13.0)
Requirement already satisfied, skipping upgrade: six>=1.10.0 in /usr/lib/python3/dist-packages (from tensorflow-gpu) (1.11.0)
Requirement already satisfied, skipping upgrade: numpy>=1.13.3 in /usr/lib/python3/dist-packages (from tensorflow-gpu) (1.13.3)
Requirement already satisfied, skipping upgrade: astor>=0.6.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu) (0.7.1)
Requirement already satisfied, skipping upgrade: tensorboard<1.14.0,>=1.13.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu) (1.13.1)
Requirement already satisfied, skipping upgrade: h5py in /usr/local/lib/python3.6/dist-packages (from keras-applications>=1.0.6->tensorflow-gpu) (2.9.0)
Requirement already satisfied, skipping upgrade: setuptools in /usr/local/lib/python3.6/dist-packages (from protobuf>=3.6.1->tensorflow-gpu) (40.6.3)
Requirement already satisfied, skipping upgrade: mock>=2.0.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow-estimator<1.14.0rc0,>=1.13.0->tensorflow-gpu) (2.0.0)
Requirement already satisfied, skipping upgrade: werkzeug>=0.11.15 in /usr/local/lib/python3.6/dist-packages (from tensorboard<1.14.0,>=1.13.0->tensorflow-gpu) (0.14.1)
Requirement already satisfied, skipping upgrade: markdown>=2.6.8 in /usr/local/lib/python3.6/dist-packages (from tensorboard<1.14.0,>=1.13.0->tensorflow-gpu) (3.0.1)
Requirement already satisfied, skipping upgrade: pbr>=0.11 in /usr/local/lib/python3.6/dist-packages (from mock>=2.0.0->tensorflow-estimator<1.14.0rc0,>=1.13.0->tensorflow-gpu) (5.1.1)
Run Code Online (Sandbox Code Playgroud)

但是,当我尝试导入tensorflow时,出现关于libcublas.so.10.0的错误:

    user:~$ python3
    Python 3.6.7 (default, Oct 22 2018, 11:32:17) 
    [GCC 8.2.0] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import tensorflow as tf
    Traceback (most recent call last):
      File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
        from tensorflow.python.pywrap_tensorflow_internal import *
      File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
        _pywrap_tensorflow_internal = swig_import_helper()
      File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
        _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
      File "/usr/lib/python3.6/imp.py", line 243, in load_module
        return load_dynamic(name, filename, file)
      File "/usr/lib/python3.6/imp.py", line 343, in load_dynamic
        return _load(spec)
    ImportError: libcublas.so.10.0: cannot open shared object file: No such file or directory

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/local/lib/python3.6/dist-packages/tensorflow/__init__.py", line 24, in <module>
        from tensorflow.python import pywrap_tensorflow  # pylint: disable=unused-import
      File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/__init__.py", line 49, in <module>
        from tensorflow.python import pywrap_tensorflow
      File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 74, in <module>
        raise ImportError(msg)
    ImportError: Traceback (most recent call last):
      File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
        from tensorflow.python.pywrap_tensorflow_internal import *
      File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
        _pywrap_tensorflow_internal = swig_import_helper()
      File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
        _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
      File "/usr/lib/python3.6/imp.py", line 243, in load_module
        return load_dynamic(name, filename, file)
      File "/usr/lib/python3.6/imp.py", line 343, in load_dynamic
        return _load(spec)
    ImportError: libcublas.so.10.0: cannot open shared object file: No such file or directory


    Failed to load the native TensorFlow runtime.

    See https://www.tensorflow.org/install/errors

    for some common reasons and solutions.  Include the entire stack trace
    above this error message when asking for help.
Run Code Online (Sandbox Code Playgroud)

我缺少什么?以及我该如何解决?

谢谢

Cal*_*Bot 30

如果使用 Cuda 10.1(按照https://www.tensorflow.org/install/gpu 中的指示),问题是 libcublas.so.10 已从 cuda-10.1 目录移出并进入 cuda-10.2(!)

从这个答案复制:https : //github.com/tensorflow/tensorflow/issues/26182#issuecomment-684993950

... libcublas.so.10 位于 /usr/local/cuda-10.2/lib64(来自 nvidia 的惊喜 - 10.1 的安装会安装一些 10.2 的东西)但只有 /usr/local/cuda 位于指向 /usr 的包含路径中/local/cuda-10.1.

修复是将其添加到您的包含路径:

export LD_LIBRARY_PATH=/usr/local/cuda-10.2/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
Run Code Online (Sandbox Code Playgroud)

注意:已知此修复适用于 Cuda 10.1 V10.1.243(使用 打印您的版本nvcc -V)。

  • 有史以来最好的答案!你救了我的命!非常感谢! (4认同)
  • 为什么他们会做出这么奇怪的事情呢? (3认同)

Ami*_*ati 17

我从以下链接CUDA 10.0下载了cuda 10.0

然后我使用以下命令安装了它:

sudo dpkg -i cuda-repo-ubuntu1804_10.0.130-1_amd64.deb
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
sudo apt-get update
sudo apt-get install cuda-10-0
Run Code Online (Sandbox Code Playgroud)

然后,我将通过链接CUDNN下载来安装CUDA 10.0的 cudnn v7.5.0,您需要使用一个帐户登录。

选择正确的版本 后,我通过链接CUDNN power链接下载了该文件,之后,我为cudnn添加了include和lib文件,如下所示:

sudo cp -P cuda/targets/ppc64le-linux/include/cudnn.h /usr/local/cuda-10.0/include/
sudo cp -P cuda/targets/ppc64le-linux/lib/libcudnn* /usr/local/cuda-10.0/lib64/
sudo chmod a+r /usr/local/cuda-10.0/lib64/libcudnn*
Run Code Online (Sandbox Code Playgroud)

修改.bashrc的lib和cuda 10.0的路径后,如果没有,则需要将它们添加到.bashrc中

export PATH=/usr/local/cuda-10.0/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64:${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
Run Code Online (Sandbox Code Playgroud)

在完成所有这些步骤之后,我成功地在python3中导入了tensorflow。


Jam*_*gan 15

CUDA 10.1(根据 tensorflow 文档安装)引发can't find libcublas.so.10.0错误。这些库存在于/usr/local/cuda-10.1/targets/x86_64-linux/lib/但被错误命名。

还有另一个(丢失的)stackoverflow 帖子说这是包的固定依赖问题,可以使用额外的 cli 标志来修复 apt。这似乎并没有解决我的问题。

经过测试的解决方法是修改指令以降级到 CUDA 10.0

# Uninstall packages from tensorflow installation instructions 
sudo apt-get remove cuda-10-1 \
    libcudnn7 \
    libcudnn7-dev \
    libnvinfer6 \
    libnvinfer-dev \
    libnvinfer-plugin6

# WORKS: Downgrade to CUDA-10.0
sudo apt-get install -y --no-install-recommends \
    cuda-10-0 \
    libcudnn7=7.6.4.38-1+cuda10.0  \
    libcudnn7-dev=7.6.4.38-1+cuda10.0;
sudo apt-get install -y --no-install-recommends \
    libnvinfer6=6.0.1-1+cuda10.0 \
    libnvinfer-dev=6.0.1-1+cuda10.0 \
    libnvinfer-plugin6=6.0.1-1+cuda10.0;
Run Code Online (Sandbox Code Playgroud)

升级到 CUDA-10.2 似乎也遇到了同样的问题

# BROKEN: Upgrade to CUDA-10.2 
# use `apt show -a libcudnn7 libnvinfer7` to find 10.2 compatable version numbers
sudo apt-get install -y --no-install-recommends \
    cuda-10-2 \
    libcudnn7=7.6.5.32-1+cuda10.2  \
    libcudnn7-dev=7.6.5.32-1+cuda10.2;
sudo apt-get install -y --no-install-recommends \
    libnvinfer7=7.0.0-1+cuda10.2 \
    libnvinfer-dev=7.0.0-1+cuda10.2 \
    libnvinfer-plugin7=7.0.0-1+cuda10.2;
Run Code Online (Sandbox Code Playgroud)

在 Python 中测试 GPU 可见性

python3
>>> import tensorflow as tf
>>> tf.test.is_gpu_available()
Run Code Online (Sandbox Code Playgroud)

关于张量流导入的 FutureWarnings

https://github.com/tensorflow/tensorflow/issues/30427

两种解决方案:

  • pip3 install tf-nightly-gpu
  • pip3 install "numpy<1.17"

更新:

您还需要正确的 tensorflow 版本以匹配您的 CUDA 版本

Tensorflow / CUDA 版本组合:

  • Tensorflow v2.x 不支持 CUDA 9(Ubuntu 18.4 默认)
  • Tensorflow v2.1.0 适用于 CUDA 10.1
  • Tensorflow v2.0.0 适用于 CUDA 10.0

查看完整列表:https : //www.tensorflow.org/install/source#tested_build_configurations

您可能需要使用与您的 CUDA 匹配的命名版本重新安装 tensorflow

pip uninstall tensorflow tensorflow-gpu
pip install tensorflow==2.1.0 tensorflow-gpu==2.1.0
Run Code Online (Sandbox Code Playgroud)

然后将 CUDA 添加到 ~/.bashrc 中的 $PATH 和 $LD_LIBRARY_PATH

~/.bashrc

# CUDA Environment Setup: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#environment-setup
for CUDA_BIN_DIR in `find /usr/local/cuda-*/bin   -maxdepth 0`; do export PATH="$PATH:$CUDA_BIN_DIR"; done;
for CUDA_LIB_DIR in `find /usr/local/cuda-*/lib64 -maxdepth 0`; do export LD_LIBRARY_PATH="${LD_LIBRARY_PATH:+${LD_LIBRARY_PATH}:}$CUDA_LIB_DIR"; done;

export            PATH=`echo $PATH            | tr ':' '\n' | awk '!x[$0]++' | tr '\n' ':' | sed 's/:$//g'` # Deduplicate $PATH
export LD_LIBRARY_PATH=`echo $LD_LIBRARY_PATH | tr ':' '\n' | awk '!x[$0]++' | tr '\n' ':' | sed 's/:$//g'` # Deduplicate $LD_LIBRARY_PATH
Run Code Online (Sandbox Code Playgroud)

  • 如果您的目标是 Ubuntu 18.04 上的 Tensorflow 1.15.0,请选择 Cuda 10.0,那么这篇文章是正确的,不管这里写了什么:https://www.tensorflow.org/install/gpu (3认同)