在 VS Code 容器中使用 GPU

Ali*_* K. 9 python docker visual-studio-code tensorflow

我想在 Visual Studio Code docker 容器内使用 GPU 来使用 TensorFlow 训练模型。为了为我的容器构建镜像,我使用下一个 Dockerfile:

FROM mcr.microsoft.com/vscode/devcontainers/anaconda:0-3


ARG PROJECT_NAME=fire_rec

ARG NODE_VERSION="none"
RUN if [ "${NODE_VERSION}" != "none" ]; then su vscode -c "umask 0002 && . /usr/local/share/nvm/nvm.sh && nvm install ${NODE_VERSION} 2>&1"; fi


COPY environment.yml* .devcontainer/noop.txt /tmp/conda-tmp/
RUN if [ -f "/tmp/conda-tmp/environment.yml" ]; then umask 0002 && /opt/conda/bin/conda env update -n base -f /tmp/conda-tmp/environment.yml; fi \
    && rm -rf /tmp/conda-tmp


WORKDIR /srv/${PROJECT_NAME}

COPY requirements.txt /srv/${PROJECT_NAME}

RUN apt-get update && apt-get install -y python3-opencv
RUN apt-get update && apt-get install -y pip
RUN python3 -m pip install --no-cache -r requirements.txt
RUN apt-get update && apt-get install -y nvidia-cuda-toolkit
Run Code Online (Sandbox Code Playgroud)

“requirements.txt”包含:

opencv-python
tensorflow-gpu
numpy
matplotlib
albumentations
tensorflow_addons
Run Code Online (Sandbox Code Playgroud)

我还有 .devcontainer.json 文件:

{
    "name": "Anaconda (Python 3)",
    "build": { 
        "context": "..",
        "dockerfile": "Dockerfile",
        "args": {
            "NODE_VERSION": "none"
        }
    },

    "settings": { 
        "python.defaultInterpreterPath": "/opt/conda/bin/python",
        "python.linting.enabled": true,
        "python.linting.pylintEnabled": true,
        "python.formatting.autopep8Path": "/opt/conda/bin/autopep8",
        "python.formatting.yapfPath": "/opt/conda/bin/yapf",
        "python.linting.flake8Path": "/opt/conda/bin/flake8",
        "python.linting.pycodestylePath": "/opt/conda/bin/pycodestyle",
        "python.linting.pydocstylePath": "/opt/conda/bin/pydocstyle",
        "python.linting.pylintPath": "/opt/conda/bin/pylint"
    },

    "extensions": [
        "ms-python.python",
        "ms-python.vscode-pylance"
    ],

    "remoteUser": "vscode",
}
Run Code Online (Sandbox Code Playgroud)

我成功构建了镜像并启动了容器。但是当我尝试在容器内的 jupyter-notebook 中启动此代码时:

import tensorflow as tf

tf.config.list_physical_devices('GPU')
Run Code Online (Sandbox Code Playgroud)

我收到下一条消息:

2022-05-05 14:42:02.712454: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2022-05-05 14:42:02.712483: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:163] no NVIDIA GPU device is present: /dev/nvidia0 does not exist
Run Code Online (Sandbox Code Playgroud)

所以这段代码无法使用GPU。我该如何解决这个问题?

Rez*_*eza 17

确保您已安装NVIDIA Container Toolkit。然后将其添加到您的 .devcontainer.json 中:

"runArgs": [
    "--gpus",
    "all"
]
Run Code Online (Sandbox Code Playgroud)

检查此选项以了解如何向 .devcontainer.json 添加更多选项


Min*_*SFT 0

先决条件:

  1. 机器有GPU显卡,并安装了GPU显卡驱动;

  2. GPU、CUDA等安装环境;

  3. 在NVIDIA-SMI中打开PM属性;

  4. 程序中指定的GPU设备;

在终端中运行python程序并使用命令:CUDA_VISIBLE_DEVICES=0 python filename.py