如何通过 WSL 使 AMD GPU 与 DALL-E Playground AI 服务器一起使用

Joe*_*ray 5 ubuntu amd-gpu windows-subsystem-for-linux amd-rocm wsl-2

我正在尝试使用 AMD GPU 在本地计算机上运行和部署 Dalle Playground,我在 Windows 11 上运行 WSL 实例。

链接到 Dalle Playground 存储库

System OS: Windows 11 Pro - Version 21H1 - OS Build 22000.675
WSL Version: WSL 2 
WSL Kernel: 5.10.16.3-microsoft-standard-WSL2
WSL OS: Ubuntu 20.04 LTS
GPU: AMD Radeon RX 6600 XT
CPU: AMD Ryzen 5 3600XT (32GB ram)
Run Code Online (Sandbox Code Playgroud)

我已经能够成功部署后端和前端,但它会占用 CPU 资源。

它给了我这个警告:

    --> Starting DALL-E Server. This might take up to two minutes.
2022-06-12 01:16:33.012306: I external/org_tensorflow/tensorflow/core/tpu/tpu_initializer_helper.cc:259] Libtpu path is: libtpu.so
2022-06-12 01:16:37.581440: I external/org_tensorflow/tensorflow/compiler/xla/service/service.cc:174] XLA service 0x5a4e760 initialized for platform Interpreter (this does not guarantee that XLA will be used). Devices:
2022-06-12 01:16:37.581474: I external/org_tensorflow/tensorflow/compiler/xla/service/service.cc:182]   StreamExecutor device (0): Interpreter, <undefined>
2022-06-12 01:16:37.587860: I external/org_tensorflow/tensorflow/compiler/xla/pjrt/tfrt_cpu_pjrt_client.cc:176] TfrtCpuClient created.
2022-06-12 01:16:37.588478: I external/org_tensorflow/tensorflow/stream_executor/tpu/tpu_platform_interface.cc:74] No TPU platform found.
WARNING:absl:No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
Run Code Online (Sandbox Code Playgroud)

我一直在尝试让 WSL 实例可以访问我的 GPU,但我无法弄清楚我做错了什么。为了与 GPU 一起使用,我从pytorch输入了以下命令:

pip3 install torch torchvision --extra-index-url https://download.pytorch.org/whl/rocm4.5.2
Run Code Online (Sandbox Code Playgroud)

我通过运行其网站上提供的示例 PyTorch 代码确认 pytorch 已正确安装。

经过一番挖掘后,我意识到还需要安装rock-dkms软件包,因此我遵循了该网站上的建议并成功安装了它 - 在遇到了很多问题之后。

当我尝试检查 GPU 的 ROC 时,会出现以下情况:

$ /opt/rocm/bin/rocminfo
ROCk module is NOT loaded, possibly no GPU devices

$ /opt/rocm/opencl/bin/clinfo
Number of platforms:                             1
  Platform Profile:                              FULL_PROFILE
  Platform Version:                              OpenCL 2.1 AMD-APP (3423.0)
  Platform Name:                                 AMD Accelerated Parallel Processing
  Platform Vendor:                               Advanced Micro Devices, Inc.
  Platform Extensions:                           cl_khr_icd cl_amd_event_callback


  Platform Name:                                 AMD Accelerated Parallel Processing
Number of devices:                               0
Run Code Online (Sandbox Code Playgroud)

根据此回复,似乎肯定有某种 AMD 兼容驱动程序可用,如果您查看所附照片,您可以看到当我尝试查询 GPU 时会显示什么。我现阶段不知道 WSL 是否可以看到和/或访问我的 GPU,因为glxinfo可以识别它,但其他任何方法都不能。(即使它让我的 VRAM 出错)

任何建议都会非常有帮助,我知道这个问题可能不是特定于项目的,但我尝试尽可能多地包含有关我正在做的事情的信息,以便最好地解决这个问题。

安装的 AMD GPU 库:

$ sudo apt list|grep -i gpu|grep installed

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

libdrm-amdgpu1/focal-updates,focal-security,now 2.4.107-8ubuntu1~20.04.2 amd64 [installed]
libosdgpu3.4.0/focal,now 3.4.0-6build1 amd64 [installed,automatic]
Run Code Online (Sandbox Code Playgroud)

安装的 ROC 软件包:

        $ apt list --installed | grep -i roc
    
    WARNING: apt does not have a stable CLI interface. Use with caution in scripts.
    
    hsa-rocr-dev/Ubuntu,now 1.5.0.50100-36 amd64 [installed,automatic]
    hsa-rocr/Ubuntu,now 1.5.0.50100-36 amd64 [installed,automatic]
    hsakmt-roct-dev/Ubuntu,now 20220128.1.7.50100-36 amd64 [installed,automatic]
    hsakmt-roct/Ubuntu,now 20210520.3.071986.40301-59 amd64 [installed,automatic]
    libopencv-imgproc4.2/focal,now 4.2.0+dfsg-5 amd64 [installed,automatic]
    libpostproc55/focal-updates,focal-security,now 7:4.2.7-0ubuntu0.1 amd64 [installed,automatic]
    libprocps8/focal-updates,now 2:3.3.16-1ubuntu2.3 amd64 [installed,automatic]
    procps/focal-updates,now 2:3.3.16-1ubuntu2.3 amd64 [installed,automatic]
    python3-ptyprocess/focal,now 0.6.0-1ubuntu1 all [installed,automatic]
    rock-dkms-firmware/Ubuntu,now 1:4.3-59 all [installed,automatic]
    rock-dkms/Ubuntu,now 1:4.3-59 all [installed,automatic]
    rocm-clang-ocl/Ubuntu,now 0.5.0.50100-36 amd64 [installed,automatic]
    rocm-cmake/Ubuntu,now 0.7.2.50100-36 amd64 [installed,automatic]
    rocm-core/Ubuntu,now 5.1.0.50100-36 amd64 [installed,automatic]
    rocm-dbgapi/Ubuntu,now 0.64.0.50100-36 amd64 [installed,automatic]
    rocm-debug-agent/Ubuntu,now 2.0.3.50100-36 amd64 [installed,automatic]
    rocm-dev/Ubuntu,now 5.1.0.50100-36 amd64 [installed,automatic]
    rocm-device-libs/Ubuntu,now 1.0.0.50100-36 amd64 [installed,automatic]
    rocm-dkms/Ubuntu,now 5.1.0.50100-36 amd64 [installed]
    rocm-gdb/Ubuntu,now 11.2.50100-36 amd64 [installed,automatic]
    rocm-llvm/Ubuntu,now 14.0.0.22114.50100-36 amd64 [installed,automatic]
    rocm-ocl-icd/Ubuntu,now 2.0.0.50100-36 amd64 [installed,automatic]
    rocm-opencl-dev/Ubuntu,now 2.0.0.50100-36 amd64 [installed,automatic]
    rocm-opencl/Ubuntu,now 2.0.0.50100-36 amd64 [installed,automatic]
    rocm-smi-lib/Ubuntu,now 5.0.0.50100-36 amd64 [installed,automatic]
    rocm-utils/Ubuntu,now 5.1.0.50100-36 amd64 [installed,automatic]
    rocminfo/Ubuntu,now 1.0.0.50100-36 amd64 [installed,automatic]
    rocprofiler-dev/Ubuntu,now 1.0.0.50100-36 amd64 [installed,automatic

]
roctracer-dev/Ubuntu,now 1.0.0.50100-36 amd64 [installed,automatic]
Run Code Online (Sandbox Code Playgroud)

尝试寻找GPU

编辑: 刚刚运行以下命令并获得有关我的系统当前状态的更多信息。

$ glxinfo -B
name of display: :0
display: :0  screen: 0
direct rendering: Yes
Extended renderer info (GLX_MESA_query_renderer):
    Vendor: Microsoft Corporation (0xffffffff)
    Device: D3D12 (AMD Radeon RX 6600 XT) (0xffffffff)
    Version: 22.2.0
    Accelerated: yes
    Video memory: 24485MB
    Unified memory: no
    Preferred profile: core (0x1)
    Max core profile version: 4.2
    Max compat profile version: 4.2
    Max GLES1 profile version: 1.1
    Max GLES[23] profile version: 3.1
OpenGL vendor string: Microsoft Corporation
OpenGL renderer string: D3D12 (AMD Radeon RX 6600 XT)
OpenGL core profile version string: 4.2 (Core Profile) Mesa 22.2.0-devel (git-cbcdcc4 2022-06-11 focal-oibaf-ppa)
OpenGL core profile shading language version string: 4.20
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile

OpenGL version string: 4.2 (Compatibility Profile) Mesa 22.2.0-devel (git-cbcdcc4 2022-06-11 focal-oibaf-ppa)
OpenGL shading language version string: 4.20
OpenGL context flags: (none)
OpenGL profile mask: compatibility profile

OpenGL ES profile version string: OpenGL ES 3.1 Mesa 22.2.0-devel (git-cbcdcc4 2022-06-11 focal-oibaf-ppa)
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.10
Run Code Online (Sandbox Code Playgroud)

更让人困惑的是,glmark2 似乎能够很好地使用我的 GPU,这可能是 dalle 程序的问题,而不是 WSL 的问题?

尽管 dalle 游乐场存在问题,GLmark2 仍成功运行

更新: 可以通过 ZLUDA 解决,但未经测试。

Joe*_*ray 5

问题最终归结为 AMD GPU 不支持 CUDA,而 DALL-E Playground 项目仅支持 CUDA。基本上要运行 DALL-E Playground,您必须使用 Nvidia GPU。或者,您可以从 CPU 运行该项目。

我希望这能涵盖任何人可能提出的任何问题。