vulkaninfo在docker中工作失败,无法识别NVIDIA GPU

f1s*_*hel 10 ubuntu nvidia docker vulkan

问题描述

当我vulkaninfo在 docker 中运行时,它会抱怨:

Cannot create Vulkan instance.
This problem is often caused by a faulty installation of the Vulkan driver or attempting to use a GPU that does not support Vulkan.
ERROR at /build/vulkan-tools-1.3.204.0~rc3-1lunarg20.04/vulkaninfo/vulkaninfo.h:649:vkCreateInstance failed with ERROR_INCOMPATIBLE_DRIVER
Run Code Online (Sandbox Code Playgroud)

看来这个问题是驱动引起的,于是我跑去nvidia-smi检查:

Sun Apr 17 03:12:54 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.60.02    Driver Version: 510.60.02    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:68:00.0 Off |                  N/A |
| 32%   37C    P8    12W / 250W |     18MiB / 11264MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1550      G                                      16MiB |
+-----------------------------------------------------------------------------+
Run Code Online (Sandbox Code Playgroud)

看来司机工作得很好。我检查环境变量NVIDIA_DRIVER_CAPABILITIES并运行lspci|grep -i vga

Cannot create Vulkan instance.
This problem is often caused by a faulty installation of the Vulkan driver or attempting to use a GPU that does not support Vulkan.
ERROR at /build/vulkan-tools-1.3.204.0~rc3-1lunarg20.04/vulkaninfo/vulkaninfo.h:649:vkCreateInstance failed with ERROR_INCOMPATIBLE_DRIVER
Run Code Online (Sandbox Code Playgroud)

如果我安装mesa-vulkan-driversvulkaninfo工作正常,但无法识别 NVIDIA GPU:

Sun Apr 17 03:12:54 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.60.02    Driver Version: 510.60.02    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:68:00.0 Off |                  N/A |
| 32%   37C    P8    12W / 250W |     18MiB / 11264MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1550      G                                      16MiB |
+-----------------------------------------------------------------------------+
Run Code Online (Sandbox Code Playgroud)

复制细节

主机系统信息:

  • 操作系统:Ubuntu 16.04 xenial
  • 内核:x86_64 Linux 4.4.0-210-generic
  • CPU:英特尔酷睿 i9-10940X CPU @ 4.8GHz
  • GPU:NVIDIA GeForce RTX 2080 Ti x 4(驱动程序 510.60.02)

码头工人信息:

  • 版本:20.10.7
  • nvidia-container-toolkit 版本:1.9.0-1

Docker启动命令:

$ echo ${NVIDIA_DRIVER_CAPABILITIES}
all
$ lspci|grep -i vga
19:00.0 VGA compatible controller: NVIDIA Corporation TU102 [GeForce RTX 2080 Ti] (rev a1)
1a:00.0 VGA compatible controller: NVIDIA Corporation TU102 [GeForce RTX 2080 Ti] (rev a1)
67:00.0 VGA compatible controller: NVIDIA Corporation TU102 [GeForce RTX 2080 Ti] (rev a1)
68:00.0 VGA compatible controller: NVIDIA Corporation TU102 [GeForce RTX 2080 Ti] (rev a1)
Run Code Online (Sandbox Code Playgroud)

Dockerfile:

FROM nvidia/cudagl:11.4.2-base-ubuntu20.04
ENV NVIDIA_DRIVER_CAPABILITIES compute,graphics,utility
ENV DEBIAN_FRONTEND noninteractive

RUN apt-get update && apt-get install -y --no-install-recommends \
    libx11-xcb-dev \
    libxkbcommon-dev \
    libwayland-dev \
    libxrandr-dev \
    libegl1-mesa-dev 
    wget && \
    rm -rf /var/lib/apt/lists/*

RUN wget -O - http://packages.lunarg.com/lunarg-signing-key-pub.asc | apt-key add - && \
    wget -O /etc/apt/sources.list.d/lunarg-vulkan-focal.list http://packages.lunarg.com/vulkan/lunarg-vulkan-focal.list && \
    apt update && \
    apt install -y vulkan-sdk
Run Code Online (Sandbox Code Playgroud)

EDIT1: 我在无头服务器上运行 ssh,我想在 docker 中进行离线渲染。