上周我一直在尝试在Azure ML studio中创建一个 python 实验。该工作包括使用具有 CUDA 11.6 的自定义环境来训练 PyTorch (1.12.1) 神经网络以实现 GPU 加速。但是,当尝试任何移动操作时,我收到运行时错误:
device = torch.device("cuda")
test_tensor = torch.rand((3, 4), device = "cpu")
test_tensor.to(device)
Run Code Online (Sandbox Code Playgroud)
CUDA error: all CUDA-capable devices are busy or unavailable
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Run Code Online (Sandbox Code Playgroud)
我尝试设置 CUDA_LAUNCH_BLOCKING=1,但这不会改变结果。
我还尝试检查 CUDA 是否可用:
print(f"Is cuda available? {torch.cuda.is_available()}")
print(f"Which is the current device? {torch.cuda.current_device()}")
print(f"How many devices do we have? …
Run Code Online (Sandbox Code Playgroud)