的nvidia-smi显示了在指示在GPU0利用3.77GB但没有进程被列出为GPU0:
(base) ~/.../fast-autoaugment$ nvidia-smi
Fri Dec 20 13:48:12 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.50 Driver Version: 430.50 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 TITAN Xp Off | 00000000:03:00.0 Off | N/A |
| 23% 34C P8 9W / 250W | 3771MiB / 12196MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 TITAN Xp Off | 00000000:84:00.0 On | N/A |
| 38% 62C P8 24W / 250W | 2295MiB / 12188MiB | 8% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 1 1910 G /usr/lib/xorg/Xorg 105MiB |
| 1 2027 G /usr/bin/gnome-shell 51MiB |
| 1 3086 G /usr/lib/xorg/Xorg 1270MiB |
| 1 3237 G /usr/bin/gnome-shell 412MiB |
| 1 30593 G /proc/self/exe 286MiB |
| 1 31849 G ...quest-channel-token=4371017438329004833 164MiB |
+-----------------------------------------------------------------------------+
Run Code Online (Sandbox Code Playgroud)
同样nvtop显示相同的 GPU RAM 利用率,但它列出的进程显示TYPE=Compute,如果您尝试杀死 PID,它会显示,然后您会收到错误:
(base) ~/.../fast-autoaugment$ kill 27761
bash: kill: (27761) - No such process
Run Code Online (Sandbox Code Playgroud)
如何回收显然是幽灵进程占用的 GPU RAM?
使用以下命令深入了解占用 GPU RAM 的幽灵进程:
sudo fuser -v /dev/nvidia*
Run Code Online (Sandbox Code Playgroud)
在我的情况下,输出是:
(base) ~/.../fast-autoaugment$ sudo fuser -v /dev/nvidia*
USER PID ACCESS COMMAND
/dev/nvidia0: shitals 517 F.... nvtop
root 1910 F...m Xorg
gdm 2027 F.... gnome-shell
root 3086 F...m Xorg
shitals 3237 F.... gnome-shell
shitals 27808 F...m python
shitals 27809 F...m python
shitals 27813 F...m python
shitals 27814 F...m python
shitals 28091 F...m python
shitals 28092 F...m python
shitals 28096 F...m python
Run Code Online (Sandbox Code Playgroud)
这显示了 nvidia-smi 和 nvtop 无法显示的进程。在我杀死所有python进程后,GPU RAM 被释放了。
要尝试的另一件事是使用以下命令重置 GPU:
sudo nvidia-smi --gpu-reset -i 0
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
1966 次 |
| 最近记录: |