使用 pytorch 获取可用的 GPU 内存总量

Question

使用 pytorch 获取可用的 GPU 内存总量

Har*_*sad 22 gpu pytorch google-colaboratory

我正在使用 google colab 免费 Gpu 进行实验，并想知道有多少 GPU 内存可用于播放，torch.cuda.memory_allocated() 返回当前占用的 GPU 内存，但我们如何使用 PyTorch 确定总可用内存。

Answer 1

pro*_*sti 48

PyTorch 可以为您提供总的、预留的和分配的信息：

t = torch.cuda.get_device_properties(0).total_memory
r = torch.cuda.memory_reserved(0)
a = torch.cuda.memory_allocated(0)
f = r-a  # free inside reserved

Run Code Online (Sandbox Code Playgroud)

Python 绑定到 NVIDIA 可以为您带来整个 GPU 的信息（在这种情况下，0 表示第一个 GPU 设备）：

from pynvml import *
nvmlInit()
h = nvmlDeviceGetHandleByIndex(0)
info = nvmlDeviceGetMemoryInfo(h)
print(f'total    : {info.total}')
print(f'free     : {info.free}')
print(f'used     : {info.used}')

Run Code Online (Sandbox Code Playgroud)

_{pip 安装 pynvml}

您可以检查nvidia-smi以获取内存信息。您可以使用，nvtop但需要从源代码安装此工具（在撰写本文时）。另一个可以检查内存的工具是 gpustat ( pip3 install gpustat)。

如果您想使用 C++ cuda：

include <iostream>
#include "cuda.h"
#include "cuda_runtime_api.h"
  
using namespace std;
  
int main( void ) {
    int num_gpus;
    size_t free, total;
    cudaGetDeviceCount( &num_gpus );
    for ( int gpu_id = 0; gpu_id < num_gpus; gpu_id++ ) {
        cudaSetDevice( gpu_id );
        int id;
        cudaGetDevice( &id );
        cudaMemGetInfo( &free, &total );
        cout << "GPU " << id << " memory: free=" << free << ", total=" << total << endl;
    }
    return 0;
}

Run Code Online (Sandbox Code Playgroud)

“torch.cuda.memory_cached”已重命名为“torch.cuda.memory_reserved” (3认同)

Answer 2

Ima*_*man 34

在最新版本的 PyTorch 中，您还可以使用 torch.cuda.mem_get_info：

https://pytorch.org/docs/stable/ generated/torch.cuda.mem_get_info.html#torch.cuda.mem_get_info

torch.cuda.mem_get_info()

Run Code Online (Sandbox Code Playgroud)

它返回一个元组，其中第一个元素是可用内存使用情况，第二个元素是总可用内存。

这比接受的答案（使用“total_memory”+保留/分配）更好，因为当其他进程/用户共享 GPU 并占用内存时，它可以提供正确的数字。 (4认同)

归档时间：	6 年，4 月前
查看次数：	25774 次
最近记录：	4 年，6 月前