无法在带有 Docker 驱动程序的 Minikube 上使用 GPU

Str*_*ter 5 gpu docker minikube nvidia-docker

目标:

\n\n

我正在尝试在使用默认 Docker 驱动程序的 Minikube 集群上使用 Nvidia GPU 功能。

\n\n

问题:

\n\n

我可以使用nvidia-docker默认docker上下文,但是当切换到默认上下文时minikube docker-env,出现以下错误:

\n\n
$ docker run --gpus all nvidia/cuda:10.0-base nvidia-smi\ndocker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].\nERRO[0000] error waiting for container: context canceled\n
Run Code Online (Sandbox Code Playgroud)\n\n

环境:

\n\n
    \n
  • 乌班图18.04
  • \n
  • Minikube v1.10.0
  • \n
  • 码头工人版本:
  • \n
\n\n
$ docker version\nClient: Docker Engine - Community\n Version:           19.03.10\n API version:       1.40\n Go version:        go1.13.10\n Git commit:        9424aeaee9\n Built:             Thu May 28 22:16:49 2020\n OS/Arch:           linux/amd64\n Experimental:      false\n\nServer:\n Engine:\n  Version:          19.03.2\n  API version:      1.40 (minimum version 1.12)\n  Go version:       go1.12.9\n  Git commit:       6a30dfca03\n  Built:            Wed Sep 11 22:45:55 2019\n  OS/Arch:          linux/amd64\n  Experimental:     false\n containerd:\n  Version:          v1.3.3-14-g449e9269\n  GitCommit:        449e926990f8539fd00844b26c07e2f1e306c760\n runc:\n  Version:          1.0.0-rc10\n  GitCommit:        \n docker-init:\n  Version:          0.18.0\n  GitCommit:\n
Run Code Online (Sandbox Code Playgroud)\n\n
    \n
  • Nvidia 容器运行时版本:
  • \n
\n\n
$ nvidia-container-runtime --version\nrunc version 1.0.0-rc10\ncommit: dc9208a3303feef5b3839f4323d9beb36df0a9dd\nspec: 1.0.1-dev\n
Run Code Online (Sandbox Code Playgroud)\n\n

附加信息:

\n\n

该集群是使用以下命令创建的:

\n\n
minikube start --cpus 3 --memory 8G\n
Run Code Online (Sandbox Code Playgroud)\n\n

minikube目前启用了以下插件:

\n\n
$ minikube addons list\n|-----------------------------|----------|--------------|\n|         ADDON NAME          | PROFILE  |    STATUS    |\n|-----------------------------|----------|--------------|\n| dashboard                   | minikube | disabled     |\n| default-storageclass        | minikube | enabled \xe2\x9c\x85    |\n| efk                         | minikube | disabled     |\n| freshpod                    | minikube | disabled     |\n| gvisor                      | minikube | disabled     |\n| helm-tiller                 | minikube | disabled     |\n| ingress                     | minikube | disabled     |\n| ingress-dns                 | minikube | disabled     |\n| istio                       | minikube | disabled     |\n| istio-provisioner           | minikube | disabled     |\n| logviewer                   | minikube | disabled     |\n| metallb                     | minikube | disabled     |\n| metrics-server              | minikube | disabled     |\n| nvidia-driver-installer     | minikube | enabled \xe2\x9c\x85    |\n| nvidia-gpu-device-plugin    | minikube | enabled \xe2\x9c\x85    |\n| registry                    | minikube | disabled     |\n| registry-aliases            | minikube | disabled     |\n| registry-creds              | minikube | disabled     |\n| storage-provisioner         | minikube | enabled \xe2\x9c\x85    |\n| storage-provisioner-gluster | minikube | disabled     |\n|-----------------------------|----------|--------------|\n
Run Code Online (Sandbox Code Playgroud)\n\n

这是minikube上下文之外的一个工作示例:

\n\n
$ docker run --gpus all nvidia/cuda:10.0-base nvidia-smi\nFri Jun  5 09:23:49 2020       \n+-----------------------------------------------------------------------------+\n| NVIDIA-SMI 440.59       Driver Version: 440.59       CUDA Version: 10.2     |\n|-------------------------------+----------------------+----------------------+\n| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |\n| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |\n|===============================+======================+======================|\n|   0  GeForce GTX 106...  Off  | 00000000:01:00.0  On |                  N/A |\n|  0%   51C    P8     6W / 120W |   1293MiB /  6077MiB |      0%      Default |\n+-------------------------------+----------------------+----------------------+\n\n+-----------------------------------------------------------------------------+\n| Processes:                                                       GPU Memory |\n|  GPU       PID   Type   Process name                             Usage      |\n|=============================================================================|\n+-----------------------------------------------------------------------------+\n
Run Code Online (Sandbox Code Playgroud)\n

OhH*_*ark 2

这是社区维基的答案。如果需要,请随意编辑和扩展它。

Minikube 的 docker 驱动程序并未正式支持 Nvidia GPU。这给您留下了两种可能的选择:

  1. 尝试使用NVIDIA Container ToolkitNVIDIA 设备插件。这是一种解决方法,可能不是您的用例中的最佳解决方案。

  2. 使用KVM2 驱动程序无驱动程序。这两个都是官方支持和记录的。

我希望它有帮助。

  • 是否可以详细说明#1——是否有关于使用这些工具创建具有 nvidia 容量的 minikube 节点的说明? (3认同)