在追踪 GPU OOM 错误的过程中,我在 Pytorch 代码(在 Google Colab P100 上运行)中做了以下检查点:
learning_rate = 0.001
num_epochs = 50
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print('check 1')
!nvidia-smi | grep MiB | awk '{print $9 $10 $11}'
model = MyModel()
print('check 2')
!nvidia-smi | grep MiB | awk '{print $9 $10 $11}'
model = model.to(device)
print('check 3')
!nvidia-smi | grep MiB | awk '{print $9 $10 $11}'
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
print('check 4')
!nvidia-smi | grep MiB | awk '{print $9 $10 …Run Code Online (Sandbox Code Playgroud)