mar*_*lon 5 cuda conda tensorflow2.0
我的当前:
nvidia-smi
Wed Aug 4 01:40:39 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.79 Driver Version: 410.79 CUDA Version: 10.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2... On | 00000000:00:0C.0 Off | 0 |
| N/A 34C P0 37W / 300W | 0MiB / 16130MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-SXM2... On | 00000000:00:0D.0 Off | 0 |
| N/A 34C P0 36W / 300W | 0MiB / 16130MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla V100-SXM2... On | 00000000:00:0E.0 Off | 0 |
| N/A 33C P0 39W / 300W | 0MiB / 16130MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla V100-SXM2... On | 00000000:00:0F.0 Off | 0 |
| N/A 37C P0 41W / 300W | 0MiB / 16130MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Run Code Online (Sandbox Code Playgroud)
我想安装Tensorflow 2.3/2.4,所以我需要在Conda中至少将cuda升级到10.1。我知道如何在 conda 中安装 cudakit:
conda install cudatoolkit=10.1
Run Code Online (Sandbox Code Playgroud)
但这似乎还不够:
Status: CUDA driver version is insufficient for CUDA runtime version
Run Code Online (Sandbox Code Playgroud)
如果我想保留旧版本的cuda 10.0,可以通过Conda将cuda更新到10.1吗?这是行不通的:
conda install cuda=10.1
Run Code Online (Sandbox Code Playgroud)
我正在使用Python 3.8。如果我不能保留cuda 10.0,如何在有或没有conda的情况下直接将cuda升级到10.1?如果能在Conda升级就最好了。
添加:
我安装了cudatoolkit=10.1,但是cuda驱动仍然不好。我的 conda 环境列表显示:
cudatoolkit 10.1.243 h6bb024c_0
tensorflow-gpu 2.3.0 pypi_0 pypi
Run Code Online (Sandbox Code Playgroud)
下面的测试是好的:
import tensorflow as tf
2021-08-04 04:21:31.110443: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
In [3]: print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
2021-08-04 04:21:34.499432: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2021-08-04 04:21:34.665738: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-04 04:21:34.666369: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:00:0c.0 name: Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.75GiB deviceMemoryBandwidth: 836.37GiB/s
2021-08-04 04:21:34.666459: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-04 04:21:34.667017: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 1 with properties:
pciBusID: 0000:00:0d.0 name: Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.75GiB deviceMemoryBandwidth: 836.37GiB/s
2021-08-04 04:21:34.667064: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-04 04:21:34.667613: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 2 with properties:
pciBusID: 0000:00:0e.0 name: Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.75GiB deviceMemoryBandwidth: 836.37GiB/s
2021-08-04 04:21:34.667644: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-08-04 04:21:34.670275: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2021-08-04 04:21:34.672971: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2021-08-04 04:21:34.673378: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2021-08-04 04:21:34.676043: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2021-08-04 04:21:34.677370: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2021-08-04 04:21:34.681850: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2021-08-04 04:21:34.681989: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-04 04:21:34.682604: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-04 04:21:34.683196: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-04 04:21:34.683782: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-04 04:21:34.684353: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-04 04:21:34.684961: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-04 04:21:34.685513: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0, 1, 2
Num GPUs Available: 3
Run Code Online (Sandbox Code Playgroud)
但以下测试失败:
import tensorflow as tf
with tf.device('/gpu:0'):
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
with tf.Session() as sess:
print (sess.run(c))
Run Code Online (Sandbox Code Playgroud)
错误信息:
2021-08-04 04:27:30.934969: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0, 1, 2
2021-08-04 04:27:30.935028: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
---------------------------------------------------------------------------
InternalError Traceback (most recent call last)
......
InternalError: cudaGetDevice() failed. Status: CUDA driver version is insufficient for CUDA runtime version
Run Code Online (Sandbox Code Playgroud)
如果这个说法属实,为什么我的安装还是不好,因为我已经在Conda中安装了cudatoolkit=10.1:
If you want to install a GPU driver, you could install a newer CUDA toolkit, which will have a newer GPU driver (installer) bundled with it.
Run Code Online (Sandbox Code Playgroud)
cudatoolkit 和 cuda 驱动程序仍然不匹配?
不,您无法通过 conda 更新 GPU 驱动程序,而这正是您的情况需要支持 CUDA 10.1 或更新版本的驱动程序。看这里:
Anaconda 要求用户安装最新的 NVIDIA 驱动程序,且满足下表中的版本要求。
(最新的表格在这里)
如果要安装 GPU 驱动程序,可以安装更新的 CUDA 工具包,该工具包将捆绑更新的 GPU 驱动程序(安装程序)。或者您可以在此处检索驱动程序并安装它。我所说的较新的 CUDA 工具包是指 NVIDIA 提供的 CUDA 工具包安装程序,可以在此处获得,而不是通过 conda。您无法通过 conda 进行驱动程序更新。
我建议您学习CUDA linux 安装指南,因为用于安装上一个驱动程序(运行文件或包管理器)的方法可能就是您想要用于下一个驱动程序的方法。
作为替代方案(例如,如果您没有或无法获得系统的管理员访问权限),您可以研究 CUDA前向兼容性。(这也可能与兼容性有关。)
| 归档时间: |
|
| 查看次数: |
17225 次 |
| 最近记录: |