GPU RAM已占用但没有PID

Shi*_*hah 1 ram gpu nvidia

nvidia-smi显示了在指示在GPU0利用3.77GB但没有进程被列出为GPU0:

(base) ~/.../fast-autoaugment$ nvidia-smi
Fri Dec 20 13:48:12 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.50       Driver Version: 430.50       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  TITAN Xp            Off  | 00000000:03:00.0 Off |                  N/A |
| 23%   34C    P8     9W / 250W |   3771MiB / 12196MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  TITAN Xp            Off  | 00000000:84:00.0  On |                  N/A |
| 38%   62C    P8    24W / 250W |   2295MiB / 12188MiB |      8%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    1      1910      G   /usr/lib/xorg/Xorg                           105MiB |
|    1      2027      G   /usr/bin/gnome-shell                          51MiB |
|    1      3086      G   /usr/lib/xorg/Xorg                          1270MiB |
|    1      3237      G   /usr/bin/gnome-shell                         412MiB |
|    1     30593      G   /proc/self/exe                               286MiB |
|    1     31849      G   ...quest-channel-token=4371017438329004833   164MiB |
+-----------------------------------------------------------------------------+
Run Code Online (Sandbox Code Playgroud)

同样nvtop显示相同的 GPU RAM 利用率,但它列出的进程显示TYPE=Compute,如果您尝试杀死 PID,它会显示,然后您会收到错误:

(base) ~/.../fast-autoaugment$ kill 27761
bash: kill: (27761) - No such process
Run Code Online (Sandbox Code Playgroud)

如何回收显然是幽灵进程占用的 GPU RAM?

Shi*_*hah 6

使用以下命令深入了解占用 GPU RAM 的幽灵进程:

sudo fuser -v /dev/nvidia*
Run Code Online (Sandbox Code Playgroud)

在我的情况下,输出是:

(base) ~/.../fast-autoaugment$ sudo fuser -v /dev/nvidia*
                     USER        PID ACCESS COMMAND
/dev/nvidia0:        shitals     517 F.... nvtop
                     root       1910 F...m Xorg
                     gdm        2027 F.... gnome-shell
                     root       3086 F...m Xorg
                     shitals    3237 F.... gnome-shell
                     shitals   27808 F...m python
                     shitals   27809 F...m python
                     shitals   27813 F...m python
                     shitals   27814 F...m python
                     shitals   28091 F...m python
                     shitals   28092 F...m python
                     shitals   28096 F...m python
Run Code Online (Sandbox Code Playgroud)

这显示了 nvidia-smi 和 nvtop 无法显示的进程。在我杀死所有python进程后,GPU RAM 被释放了。

要尝试的另一件事是使用以下命令重置 GPU:

sudo nvidia-smi --gpu-reset -i 0
Run Code Online (Sandbox Code Playgroud)