如何修复 Pytorch 中的“运行时错误:CUDA 错误:触发设备端断言”

alm*_*zer 3 gpu pytorch yolo

我正在尝试 在我的自定义形状数据集上从此 repo https://github.com/eriklindernoren/PyTorch-YOLOv3训练 yolo-v3 模型,但我不断收到错误“运行时错误:CUDA 错误:触发设备端断言”

我试图查找解决方案并尝试了不同答案中建议的几件事(例如修复注释中类的索引),但错误仍然存​​在。

我正在按照 repo 自述文件中的描述对自定义数据集进行训练,并相应地调整了 custom.data 和 data/custom/。

我一直收到这个输出。

C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [32,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [33,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [34,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [35,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [36,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [37,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [38,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [39,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [40,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [41,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [42,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [43,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [44,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [45,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [0,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [1,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [2,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [3,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [4,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [5,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [6,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [7,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [12,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [13,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [14,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [15,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [20,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [21,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [22,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [23,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [24,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [25,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [26,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [27,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [28,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [29,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: block: [0,0,0], thread: [31,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
Traceback (most recent call last):
  File "train.py", line 105, in <module>
    loss, outputs = model(imgs, targets)
  File "C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\nn\modules\module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "D:\Documents\GP\Code\TorchYolo\PyTorch-YOLOv3\models.py", line 259, in forward
    x, layer_loss = module[0](x, targets, img_dim)
  File "C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\nn\modules\module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "D:\Documents\GP\Code\TorchYolo\PyTorch-YOLOv3\models.py", line 188, in forward
    ignore_thres=self.ignore_thres,
  File "D:\Documents\GP\Code\TorchYolo\PyTorch-YOLOv3\utils\utils.py", line 318, in build_targets
    iou_scores[b, best_n, gj, gi] = bbox_iou(pred_boxes[b, best_n, gj, gi], target_boxes, x1y1x2y2=False)
  File "D:\Documents\GP\Code\TorchYolo\PyTorch-YOLOv3\utils\utils.py", line 199, in bbox_iou
    b1_x1, b1_x2 = box1[:, 0] - box1[:, 2] / 2, box1[:, 0] + box1[:, 2] / 2
RuntimeError: CUDA error: device-side assert triggered

Run Code Online (Sandbox Code Playgroud)

与 train.jpg 类标签索引混淆时,唯一改变的是数组索引中的“2”

b1_x1, b1_x2 = box1[:, 0] - box1[:, 2] / 2, box1[:, 0] + box1[:, 2] / 2
Run Code Online (Sandbox Code Playgroud)

cod*_*ina 8

我在 google colab 上训练 resnet 模型时也遇到了这个问题。所以在我的例子中,我在 7 个类上训练模型,但我的网络的最后一层设置为输出 3 个类。

因此我改变了

self.classifier = torch.nn.Sequential(torch.nn.BatchNorm1d(512), torch.nn.Linear(512, 3))
Run Code Online (Sandbox Code Playgroud)

对此

self.classifier = torch.nn.Sequential(torch.nn.BatchNorm1d(512), torch.nn.Linear(512, 7))
Run Code Online (Sandbox Code Playgroud)

这样做之后,我仍然收到错误消息,因为我没有重新启动 google colab。

请记住,每当您收到此错误时请检查两件事:

  1. class_labels 应该从 0 开始,即在我的情况下 [0,1,2,3,4,5,6] 为 7 个类。
  2. 检查最终输出层是否输出准确的类数

在那之后,

刷新笔记本以刷新所有 cuda 断言。

在出现任何 CUDA 错误后,重新启动笔记本,否则您将继续收到 CUDA 错误,因为之前的断言尚未清除。通过重新启动笔记本,您将清除所有 cuda 断言。


Sha*_*hai 7

通常,当您遇到神秘的 CUDA 错误时,您应该切换到 CPU,看看是否会收到更有意义的错误消息。

或者,设置CUDA_LAUNCH_BLOCKING=1为获取更多信息的堆栈跟踪(有关详细信息,请参阅此答案)。

有关更多详细信息,请参阅此答案。


,在你的情况下,你除以 2 似乎会创建分数,而 pytorch 会在其中寻找整数。尝试

b1_x1, b1_x2 = box1[:, 0] - box1[:, 2] // 2, box1[:, 0] + box1[:, 2] // 2
Run Code Online (Sandbox Code Playgroud)