如何自定义nvidia-smi的输出以显示PID用户名

Dan*_*ong 7 cuda resource-monitor

nvidia-smi的正常输出如下:

Thu May 10 09:05:07 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.111                Driver Version: 384.111                   |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 00000000:0A:00.0 Off |                  N/A |
| 61%   74C    P2   195W / 250W |   5409MiB / 11172MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      5973      C   ...master_JPG/build/tools/program_pytho.bin  4862MiB |
|    0     46324      C   python                                       537MiB |
+-----------------------------------------------------------------------------+
Run Code Online (Sandbox Code Playgroud)

如您所见,它显示了运行CPU的PID列表.但是我也想知道PID的名称.我可以自定义输出以显示每个PID的用户名吗?我已经知道如何显示单个PID的用户名:

ps -u -p $pid
Run Code Online (Sandbox Code Playgroud)

请帮我.非常感谢你.

更新:我已经发布了适用于我的解决方案.我还将这个上传到Github,作为那些需要详细GPU信息的人的简单脚本:

https://github.com/ManhTruongDang/check-gpu

小智 12

我是用nvidia-smi -q -xnvidia-smi 的 XML 样式输出完成的

ps -up `nvidia-smi -q -x | grep pid | sed -e 's/<pid>//g' -e 's/<\/pid>//g' -e 's/^[[:space:]]*//'`
Run Code Online (Sandbox Code Playgroud)


Mar*_*cka 10

我创建了一个脚本,它接受nvidia-smi输出并用更多信息丰富它:https://github.com/peci1/nvidia-htop.

它是一个python脚本,它解析GPU进程列表,解析PID,运行它们ps以收集更多信息,然后nvidia-smi用丰富的列表替换进程列表.

使用示例:

$ nvidia-smi | nvidia-htop.py -l
Mon May 21 15:06:35 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.25                 Driver Version: 390.25                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 00000000:04:00.0 Off |                  N/A |
| 53%   75C    P2   174W / 250W |  10807MiB / 11178MiB |     97%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 108...  Off  | 00000000:05:00.0 Off |                  N/A |
| 66%   82C    P2   220W / 250W |  10783MiB / 11178MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
|   2  GeForce GTX 108...  Off  | 00000000:08:00.0 Off |                  N/A |
| 45%   67C    P2    85W / 250W |  10793MiB / 11178MiB |     51%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
|  GPU   PID     USER    GPU MEM  %MEM  %CPU  COMMAND                                                                                               |
|    0  1032 anonymou   10781MiB   308   3.7  python train_image_classifier.py --train_dir=/mnt/xxxxxxxx/xxxxxxxx/xxxxxxxx/xxxxxxx/xxxxxxxxxxxxxxx  |
|    1 11021 cannotte   10765MiB   114   1.5  python3 ./train.py --flagfile /xxxxxxxx/xxxxxxxx/xxxxxxxx/xxxxxxxxx/xx/xxxxxxxxxxxxxxx                |
|    2 25544 nevermin   10775MiB   108   2.0  python -m xxxxxxxxxxxxxxxxxxxxxxxxxxxxx                                                               |
+-----------------------------------------------------------------------------+
Run Code Online (Sandbox Code Playgroud)


Iro*_*ron 2

以前的解决方案不起作用,所以我在这里发布我的解决方案。我使用的NVIDIA-SMI版本是440.44,但我认为这并不重要。

nvidia-smi | tee /dev/stderr | awk '/ C / {print $3}' | xargs -r ps -up
Run Code Online (Sandbox Code Playgroud)

一点解释:

  • tee:避免调用 nvidia-smi 两次
  • awk:获取计算进程的 PID 列(类型 C)
  • xargs -r-r检查输入是否为空,以避免出现不需要的错误消息ps -up

如果你想让它成为.bash_profileor的别名.bashrc

alias nvidia-smi2='nvidia-smi | tee /dev/stderr | awk "/ C / {print \$3}" | xargs -r ps -up'
Run Code Online (Sandbox Code Playgroud)

不同的是它必须先逃走$3