Jai*_*ton 47 python gpu tensorflow
我正在尝试使用tensorflow nightly 2.12(以便能够使用Cuda 12.0)在GPU(NVIDIA GeForce RTX 3050)上运行一些模型。我遇到的问题是,显然我所做的每项检查似乎都是正确的,但最终脚本无法检测到 GPU。我花了很多时间试图了解正在发生的事情,但似乎没有任何效果,因此任何建议或解决方案都将受到欢迎。GPU 似乎正在为 torch 工作,正如您在问题的最后看到的那样。
我将展示我所做的一些有关 CUDA 的最常见检查(Visual Studio Code 终端),希望您发现它们有用:
检查CUDA版本:
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Fri_Jan__6_16:45:21_PST_2023
Cuda compilation tools, release 12.0, V12.0.140
Build cuda_12.0.r12.0/compiler.32267302_0
Run Code Online (Sandbox Code Playgroud)
检查与CUDA库的连接是否正确:
$ echo $LD_LIBRARY_PATH
/usr/cuda/lib
Run Code Online (Sandbox Code Playgroud)
检查 GPU 的 nvidia 驱动程序并检查 venv 的 GPU 是否可读:
$ nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12 Driver Version: 525.85.12 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:01:00.0 On | N/A |
| N/A 40C P5 6W / 20W | 46MiB / 4096MiB | 22% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1356 G /usr/lib/xorg/Xorg 45MiB |
+-----------------------------------------------------------------------------+
Run Code Online (Sandbox Code Playgroud)
添加 cuda/bin 路径并检查它:
$ export PATH="/usr/local/cuda/bin:$PATH"
$ echo $PATH
/usr/local/cuda-12.0/bin:/home/victus-linux/Escritorio/MasterThesis_CODE/to_share/venv_master/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/snap/bin
Run Code Online (Sandbox Code Playgroud)
用于检查 CUDA 是否正确安装的自定义函数:[ Sherlock 的函数]
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Fri_Jan__6_16:45:21_PST_2023
Cuda compilation tools, release 12.0, V12.0.140
Build cuda_12.0.r12.0/compiler.32267302_0
Run Code Online (Sandbox Code Playgroud)
libcudart.so.12 -> libcudart.so.12.0.146
libcuda.so.1 -> libcuda.so.525.85.12
libcuda.so.1 -> libcuda.so.525.85.12
libcudadebugger.so.1 -> libcudadebugger.so.525.85.12
libcuda is installed
libcudart.so.12 -> libcudart.so.12.0.146
libcudart is installed
Run Code Online (Sandbox Code Playgroud)
自定义函数来检查 Cudnn 是否正确安装:[ function by Sherlock ]
/usr/cuda/lib
Run Code Online (Sandbox Code Playgroud)
libcudnn_cnn_train.so.8 -> libcudnn_cnn_train.so.8.8.0
libcudnn_cnn_infer.so.8 -> libcudnn_cnn_infer.so.8.8.0
libcudnn_adv_train.so.8 -> libcudnn_adv_train.so.8.8.0
libcudnn.so.8 -> libcudnn.so.8.8.0
libcudnn_ops_train.so.8 -> libcudnn_ops_train.so.8.8.0
libcudnn_adv_infer.so.8 -> libcudnn_adv_infer.so.8.8.0
libcudnn_ops_infer.so.8 -> libcudnn_ops_infer.so.8.8.0
libcudnn is installed
Run Code Online (Sandbox Code Playgroud)
因此,一旦我完成了之前的检查,我就使用脚本来评估一切是否最终正常,然后出现以下错误:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12 Driver Version: 525.85.12 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:01:00.0 On | N/A |
| N/A 40C P5 6W / 20W | 46MiB / 4096MiB | 22% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1356 G /usr/lib/xorg/Xorg 45MiB |
+-----------------------------------------------------------------------------+
Run Code Online (Sandbox Code Playgroud)
2023-03-02 12:05:09.463343: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-03-02 12:05:09.489911: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-03-02 12:05:09.490522: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-03-02 12:05:10.066759: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Tensorflow version = 2.12.0-dev20230203
2023-03-02 12:05:10.748675: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2023-03-02 12:05:10.771263: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1956] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
[]
Run Code Online (Sandbox Code Playgroud)
额外检查:我尝试在 torch 上运行一个检查脚本,在这里它起作用了,所以我猜问题与 tensorflow/tf-nightly 有关
/usr/local/cuda-12.0/bin:/home/victus-linux/Escritorio/MasterThesis_CODE/to_share/venv_master/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/snap/bin
Run Code Online (Sandbox Code Playgroud)
Available cuda = True
GPUs availables = 1
Current device = 0
Current Device location = <torch.cuda.device object at 0x7fbe26fd2ec0>
Name of the device = NVIDIA GeForce RTX 3050 Laptop GPU
Run Code Online (Sandbox Code Playgroud)
如果您知道一些可能有助于解决此问题的信息,请随时告诉我。
ari*_*ero 31
我认为,截至 2023 年 3 月,cuda 12 的唯一张量流发行版是 NVIDIA 的 docker 软件包。
cuda 12 的 tf 包应显示以下信息
>>> tf.sysconfig.get_build_info()
OrderedDict([('cpu_compiler', '/usr/bin/x86_64-linux-gnu-gcc-11'),
('cuda_compute_capabilities', ['compute_86']),
('cuda_version', '12.0'), ('cudnn_version', '8'),
('is_cuda_build', True), ('is_rocm_build', False), ('is_tensorrt_build', True)])
Run Code Online (Sandbox Code Playgroud)
但是,如果我们在通过 pip 安装的任何 TensorFlow 包上运行 tf.sysconfig.get_build_info(),它仍然会告诉 cuda_version 是 11.x
所以你的选择是:
小智 8
“我也遇到过同样的事情,安装TensorFlowRT就可以解决。”
修复所有库后,GPU 输出将可见。GPU 可见:
小智 5
一个更简单、更最新的解决方案 - 只需使用以下命令进行安装:
pip3 install tensorflow[and-cuda]
Run Code Online (Sandbox Code Playgroud)
安装 cuda-11 库和张量流对我来说没有任何问题(ubuntu 22.04,RTX-4090)。
| 归档时间: |
|
| 查看次数: |
68922 次 |
| 最近记录: |