Dan*_*ong 7 cuda resource-monitor
nvidia-smi的正常输出如下:
Thu May 10 09:05:07 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.111 Driver Version: 384.111 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... Off | 00000000:0A:00.0 Off | N/A |
| 61% 74C P2 195W / 250W | 5409MiB / 11172MiB | 100% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 5973 C ...master_JPG/build/tools/program_pytho.bin 4862MiB |
| 0 46324 C python 537MiB |
+-----------------------------------------------------------------------------+
Run Code Online (Sandbox Code Playgroud)
如您所见,它显示了运行CPU的PID列表.但是我也想知道PID的名称.我可以自定义输出以显示每个PID的用户名吗?我已经知道如何显示单个PID的用户名:
ps -u -p $pid
Run Code Online (Sandbox Code Playgroud)
请帮我.非常感谢你.
更新:我已经发布了适用于我的解决方案.我还将这个上传到Github,作为那些需要详细GPU信息的人的简单脚本:
小智 12
我是用nvidia-smi -q -x
nvidia-smi 的 XML 样式输出完成的
ps -up `nvidia-smi -q -x | grep pid | sed -e 's/<pid>//g' -e 's/<\/pid>//g' -e 's/^[[:space:]]*//'`
Run Code Online (Sandbox Code Playgroud)
Mar*_*cka 10
我创建了一个脚本,它接受nvidia-smi输出并用更多信息丰富它:https://github.com/peci1/nvidia-htop.
它是一个python脚本,它解析GPU进程列表,解析PID,运行它们ps
以收集更多信息,然后nvidia-smi
用丰富的列表替换进程列表.
使用示例:
$ nvidia-smi | nvidia-htop.py -l
Mon May 21 15:06:35 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.25 Driver Version: 390.25 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... Off | 00000000:04:00.0 Off | N/A |
| 53% 75C P2 174W / 250W | 10807MiB / 11178MiB | 97% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 108... Off | 00000000:05:00.0 Off | N/A |
| 66% 82C P2 220W / 250W | 10783MiB / 11178MiB | 100% Default |
+-------------------------------+----------------------+----------------------+
| 2 GeForce GTX 108... Off | 00000000:08:00.0 Off | N/A |
| 45% 67C P2 85W / 250W | 10793MiB / 11178MiB | 51% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| GPU PID USER GPU MEM %MEM %CPU COMMAND |
| 0 1032 anonymou 10781MiB 308 3.7 python train_image_classifier.py --train_dir=/mnt/xxxxxxxx/xxxxxxxx/xxxxxxxx/xxxxxxx/xxxxxxxxxxxxxxx |
| 1 11021 cannotte 10765MiB 114 1.5 python3 ./train.py --flagfile /xxxxxxxx/xxxxxxxx/xxxxxxxx/xxxxxxxxx/xx/xxxxxxxxxxxxxxx |
| 2 25544 nevermin 10775MiB 108 2.0 python -m xxxxxxxxxxxxxxxxxxxxxxxxxxxxx |
+-----------------------------------------------------------------------------+
Run Code Online (Sandbox Code Playgroud)
以前的解决方案不起作用,所以我在这里发布我的解决方案。我使用的NVIDIA-SMI版本是440.44,但我认为这并不重要。
nvidia-smi | tee /dev/stderr | awk '/ C / {print $3}' | xargs -r ps -up
Run Code Online (Sandbox Code Playgroud)
一点解释:
tee
:避免调用 nvidia-smi 两次awk
:获取计算进程的 PID 列(类型 C)xargs -r
:-r
检查输入是否为空,以避免出现不需要的错误消息ps -up
如果你想让它成为.bash_profile
or的别名.bashrc
:
alias nvidia-smi2='nvidia-smi | tee /dev/stderr | awk "/ C / {print \$3}" | xargs -r ps -up'
Run Code Online (Sandbox Code Playgroud)
不同的是它必须先逃走$3
。