Ami*_*ati 6 cuda tensorflow ubuntu-18.04
我已经在Ubuntu 18.04上安装了Cuda 10.1和cudnn,并且似乎已正确安装为nvcc和nvidia-smi类型,我得到了正确的响应:
user:~$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Fri_Feb__8_19:08:17_PST_2019
Cuda compilation tools, release 10.1, V10.1.105
user:~$ nvidia-smi
Mon Mar 18 14:36:47 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.43 Driver Version: 418.43 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Quadro K5200 Off | 00000000:03:00.0 On | Off |
| 26% 39C P8 14W / 150W | 225MiB / 8118MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1538 G /usr/lib/xorg/Xorg 32MiB |
| 0 1583 G /usr/bin/gnome-shell 5MiB |
| 0 3008 G /usr/lib/xorg/Xorg 100MiB |
| 0 3120 G /usr/bin/gnome-shell 82MiB |
+-----------------------------------------------------------------------------+
Run Code Online (Sandbox Code Playgroud)
我已经使用以下方法安装了tensorflow:
user:~$ sudo pip3 install --upgrade tensorflow-gpu
The directory '/home/amin/.cache/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
The directory '/home/amin/.cache/pip' or its parent directory is not owned by the current user and caching wheels has been disabled. check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
Requirement already up-to-date: tensorflow-gpu in /usr/local/lib/python3.6/dist-packages (1.13.1)
Requirement already satisfied, skipping upgrade: keras-applications>=1.0.6 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu) (1.0.7)
Requirement already satisfied, skipping upgrade: protobuf>=3.6.1 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu) (3.6.1)
Requirement already satisfied, skipping upgrade: wheel>=0.26 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu) (0.32.3)
Requirement already satisfied, skipping upgrade: absl-py>=0.1.6 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu) (0.7.0)
Requirement already satisfied, skipping upgrade: keras-preprocessing>=1.0.5 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu) (1.0.9)
Requirement already satisfied, skipping upgrade: gast>=0.2.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu) (0.2.2)
Requirement already satisfied, skipping upgrade: termcolor>=1.1.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu) (1.1.0)
Requirement already satisfied, skipping upgrade: grpcio>=1.8.6 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu) (1.18.0)
Requirement already satisfied, skipping upgrade: tensorflow-estimator<1.14.0rc0,>=1.13.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu) (1.13.0)
Requirement already satisfied, skipping upgrade: six>=1.10.0 in /usr/lib/python3/dist-packages (from tensorflow-gpu) (1.11.0)
Requirement already satisfied, skipping upgrade: numpy>=1.13.3 in /usr/lib/python3/dist-packages (from tensorflow-gpu) (1.13.3)
Requirement already satisfied, skipping upgrade: astor>=0.6.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu) (0.7.1)
Requirement already satisfied, skipping upgrade: tensorboard<1.14.0,>=1.13.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow-gpu) (1.13.1)
Requirement already satisfied, skipping upgrade: h5py in /usr/local/lib/python3.6/dist-packages (from keras-applications>=1.0.6->tensorflow-gpu) (2.9.0)
Requirement already satisfied, skipping upgrade: setuptools in /usr/local/lib/python3.6/dist-packages (from protobuf>=3.6.1->tensorflow-gpu) (40.6.3)
Requirement already satisfied, skipping upgrade: mock>=2.0.0 in /usr/local/lib/python3.6/dist-packages (from tensorflow-estimator<1.14.0rc0,>=1.13.0->tensorflow-gpu) (2.0.0)
Requirement already satisfied, skipping upgrade: werkzeug>=0.11.15 in /usr/local/lib/python3.6/dist-packages (from tensorboard<1.14.0,>=1.13.0->tensorflow-gpu) (0.14.1)
Requirement already satisfied, skipping upgrade: markdown>=2.6.8 in /usr/local/lib/python3.6/dist-packages (from tensorboard<1.14.0,>=1.13.0->tensorflow-gpu) (3.0.1)
Requirement already satisfied, skipping upgrade: pbr>=0.11 in /usr/local/lib/python3.6/dist-packages (from mock>=2.0.0->tensorflow-estimator<1.14.0rc0,>=1.13.0->tensorflow-gpu) (5.1.1)
Run Code Online (Sandbox Code Playgroud)
但是,当我尝试导入tensorflow时,出现关于libcublas.so.10.0的错误:
user:~$ python3
Python 3.6.7 (default, Oct 22 2018, 11:32:17)
[GCC 8.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
from tensorflow.python.pywrap_tensorflow_internal import *
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
_pywrap_tensorflow_internal = swig_import_helper()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
File "/usr/lib/python3.6/imp.py", line 243, in load_module
return load_dynamic(name, filename, file)
File "/usr/lib/python3.6/imp.py", line 343, in load_dynamic
return _load(spec)
ImportError: libcublas.so.10.0: cannot open shared object file: No such file or directory
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.6/dist-packages/tensorflow/__init__.py", line 24, in <module>
from tensorflow.python import pywrap_tensorflow # pylint: disable=unused-import
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/__init__.py", line 49, in <module>
from tensorflow.python import pywrap_tensorflow
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 74, in <module>
raise ImportError(msg)
ImportError: Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
from tensorflow.python.pywrap_tensorflow_internal import *
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
_pywrap_tensorflow_internal = swig_import_helper()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
File "/usr/lib/python3.6/imp.py", line 243, in load_module
return load_dynamic(name, filename, file)
File "/usr/lib/python3.6/imp.py", line 343, in load_dynamic
return _load(spec)
ImportError: libcublas.so.10.0: cannot open shared object file: No such file or directory
Failed to load the native TensorFlow runtime.
See https://www.tensorflow.org/install/errors
for some common reasons and solutions. Include the entire stack trace
above this error message when asking for help.
Run Code Online (Sandbox Code Playgroud)
我缺少什么?以及我该如何解决?
谢谢
Cal*_*Bot 30
如果使用 Cuda 10.1(按照https://www.tensorflow.org/install/gpu 中的指示),问题是 libcublas.so.10 已从 cuda-10.1 目录移出并进入 cuda-10.2(!)
从这个答案复制:https : //github.com/tensorflow/tensorflow/issues/26182#issuecomment-684993950
... libcublas.so.10 位于 /usr/local/cuda-10.2/lib64(来自 nvidia 的惊喜 - 10.1 的安装会安装一些 10.2 的东西)但只有 /usr/local/cuda 位于指向 /usr 的包含路径中/local/cuda-10.1.
该修复是将其添加到您的包含路径:
export LD_LIBRARY_PATH=/usr/local/cuda-10.2/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
Run Code Online (Sandbox Code Playgroud)
注意:已知此修复适用于 Cuda 10.1 V10.1.243(使用 打印您的版本nvcc -V)。
Ami*_*ati 17
然后我使用以下命令安装了它:
sudo dpkg -i cuda-repo-ubuntu1804_10.0.130-1_amd64.deb
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
sudo apt-get update
sudo apt-get install cuda-10-0
Run Code Online (Sandbox Code Playgroud)
然后,我将通过链接CUDNN下载来安装CUDA 10.0的 cudnn v7.5.0,您需要使用一个帐户登录。
选择正确的版本 后,我通过链接CUDNN power链接下载了该文件,之后,我为cudnn添加了include和lib文件,如下所示:
sudo cp -P cuda/targets/ppc64le-linux/include/cudnn.h /usr/local/cuda-10.0/include/
sudo cp -P cuda/targets/ppc64le-linux/lib/libcudnn* /usr/local/cuda-10.0/lib64/
sudo chmod a+r /usr/local/cuda-10.0/lib64/libcudnn*
Run Code Online (Sandbox Code Playgroud)
修改.bashrc的lib和cuda 10.0的路径后,如果没有,则需要将它们添加到.bashrc中
export PATH=/usr/local/cuda-10.0/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64:${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
Run Code Online (Sandbox Code Playgroud)
在完成所有这些步骤之后,我成功地在python3中导入了tensorflow。
Jam*_*gan 15
CUDA 10.1(根据 tensorflow 文档安装)引发can't find libcublas.so.10.0错误。这些库存在于/usr/local/cuda-10.1/targets/x86_64-linux/lib/但被错误命名。
还有另一个(丢失的)stackoverflow 帖子说这是包的固定依赖问题,可以使用额外的 cli 标志来修复 apt。这似乎并没有解决我的问题。
经过测试的解决方法是修改指令以降级到 CUDA 10.0
# Uninstall packages from tensorflow installation instructions
sudo apt-get remove cuda-10-1 \
libcudnn7 \
libcudnn7-dev \
libnvinfer6 \
libnvinfer-dev \
libnvinfer-plugin6
# WORKS: Downgrade to CUDA-10.0
sudo apt-get install -y --no-install-recommends \
cuda-10-0 \
libcudnn7=7.6.4.38-1+cuda10.0 \
libcudnn7-dev=7.6.4.38-1+cuda10.0;
sudo apt-get install -y --no-install-recommends \
libnvinfer6=6.0.1-1+cuda10.0 \
libnvinfer-dev=6.0.1-1+cuda10.0 \
libnvinfer-plugin6=6.0.1-1+cuda10.0;
Run Code Online (Sandbox Code Playgroud)
升级到 CUDA-10.2 似乎也遇到了同样的问题
# BROKEN: Upgrade to CUDA-10.2
# use `apt show -a libcudnn7 libnvinfer7` to find 10.2 compatable version numbers
sudo apt-get install -y --no-install-recommends \
cuda-10-2 \
libcudnn7=7.6.5.32-1+cuda10.2 \
libcudnn7-dev=7.6.5.32-1+cuda10.2;
sudo apt-get install -y --no-install-recommends \
libnvinfer7=7.0.0-1+cuda10.2 \
libnvinfer-dev=7.0.0-1+cuda10.2 \
libnvinfer-plugin7=7.0.0-1+cuda10.2;
Run Code Online (Sandbox Code Playgroud)
在 Python 中测试 GPU 可见性
python3
>>> import tensorflow as tf
>>> tf.test.is_gpu_available()
Run Code Online (Sandbox Code Playgroud)
关于张量流导入的 FutureWarnings
https://github.com/tensorflow/tensorflow/issues/30427
两种解决方案:
pip3 install tf-nightly-gpupip3 install "numpy<1.17"更新:
您还需要正确的 tensorflow 版本以匹配您的 CUDA 版本
Tensorflow / CUDA 版本组合:
查看完整列表:https : //www.tensorflow.org/install/source#tested_build_configurations
您可能需要使用与您的 CUDA 匹配的命名版本重新安装 tensorflow
pip uninstall tensorflow tensorflow-gpu
pip install tensorflow==2.1.0 tensorflow-gpu==2.1.0
Run Code Online (Sandbox Code Playgroud)
然后将 CUDA 添加到 ~/.bashrc 中的 $PATH 和 $LD_LIBRARY_PATH
~/.bashrc
# CUDA Environment Setup: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#environment-setup
for CUDA_BIN_DIR in `find /usr/local/cuda-*/bin -maxdepth 0`; do export PATH="$PATH:$CUDA_BIN_DIR"; done;
for CUDA_LIB_DIR in `find /usr/local/cuda-*/lib64 -maxdepth 0`; do export LD_LIBRARY_PATH="${LD_LIBRARY_PATH:+${LD_LIBRARY_PATH}:}$CUDA_LIB_DIR"; done;
export PATH=`echo $PATH | tr ':' '\n' | awk '!x[$0]++' | tr '\n' ':' | sed 's/:$//g'` # Deduplicate $PATH
export LD_LIBRARY_PATH=`echo $LD_LIBRARY_PATH | tr ':' '\n' | awk '!x[$0]++' | tr '\n' ':' | sed 's/:$//g'` # Deduplicate $LD_LIBRARY_PATH
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
14026 次 |
| 最近记录: |