WSL2 Pytorch - 运行时错误:RTX3080 没有可用的 CUDA GPU

har*_*old 11 pytorch

我一整天都在努力使用 RTX 3080 让火炬在 WSL2 上工作。

我安装了 CUDA 工具包版本 11.3

运行nvcc -V返回此:

nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Mar_21_19:15:46_PDT_2021
Cuda compilation tools, release 11.3, V11.3.58
Build cuda_11.3.r11.3/compiler.29745058_0
Run Code Online (Sandbox Code Playgroud)

nvidia-smi返回这个

Mon Nov 29 00:38:26 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.00       Driver Version: 510.06       CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:01:00.0  On |                  N/A |
| N/A   52C    P5    17W /  N/A |   1082MiB / 16384MiB |     N/A      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
Run Code Online (Sandbox Code Playgroud)

我用blackscholes验证了工具包的安装

./BlackScholes
[./BlackScholes] - Starting...
GPU Device 0: "Ampere" with compute capability 8.6

Initializing data...
...allocating CPU memory for options.
...allocating GPU memory for options.
...generating input data in CPU mem.
...copying input data to GPU mem.
Data init done.

Executing Black-Scholes GPU kernel (512 iterations)...
Options count             : 8000000
BlackScholesGPU() time    : 0.242822 msec
Effective memory bandwidth: 329.459087 GB/s
Gigaoptions per second    : 32.945909

BlackScholes, Throughput = 32.9459 GOptions/s, Time = 0.00024 s, Size = 8000000 options, NumDevsUsed = 1, Workgroup = 128

Reading back GPU results...
Checking the results...
...running CPU calculations.

Comparing the results...
L1 norm: 1.741792E-07
Max absolute error: 1.192093E-05

Shutting down...
...releasing GPU memory.
...releasing CPU memory.
Shutdown done.

[BlackScholes] - Test Summary

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

Test passed
Run Code Online (Sandbox Code Playgroud)

当我尝试使用 torch 时,它找不到任何 GPU。顺便说一句,如果我想在 RTX 3080 上使用 torch,我必须安装 torch==1.10.0+cu113,因为简单 1.10.0 版本的 sm_ 与 rtx3080 不兼容。

运行火炬会返回以下内容:

>>> import torch
>>> torch.version
<module 'torch.version' from '/home/snihar/miniconda3/envs/tscbasset/lib/python3.7/site-packages/torch/version.py'>
>>> torch.version.cuda
'11.3'
>>> torch.cuda.get_arch_list()
[]
>>> torch.cuda.device_count()
0
>>>  torch.cuda.current_device()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/snihar/miniconda3/envs/tscbasset/lib/python3.7/site-packages/torch/cuda/__init__.py", line 479, in current_device
    _lazy_init()
  File "/home/snihar/miniconda3/envs/tscbasset/lib/python3.7/site-packages/torch/cuda/__init__.py", line 214, in _lazy_init
    torch._C._cuda_init()
RuntimeError: No CUDA GPUs are available
Run Code Online (Sandbox Code Playgroud)

最后,有趣的是,我完全能够在同一台机器上运行tensorflow-gpu。

像这样安装了 pytorch:conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch

另外,我设法使用以下命令在从 WSL2 机器启动的 docker 容器中运行 pytorch:

sudo docker run --gpus all -it --rm -v /home/...:/home/... nvcr.io/nvidia/pytorch:21.11-py3.  
Run Code Online (Sandbox Code Playgroud)

当在 Windows 机器上运行 pytorch 时,我正在运行 WSL,它也可以工作。两者都返回 ['sm_37', 'sm_50', 'sm_60', 'sm_61', 'sm_70', 'sm_75', 'sm_80', 'sm_86', 'compute_37'] 表示该库与 rtx 3080 兼容。

ska*_*t76 3

我遇到了同样的问题,通过将 pytorch 从 1.10 降级到 1.8.2LTS 解决