运行时错误：CUDA 内存不足。尝试分配 2.86 GiB（GPU 0；10.92 GiB 总容量；...PyTorch 总共保留了 9.06 GiB）

这是什么意思9.06 GiB reserved in total by PyTorch。

如果我7.80 GiB total capacity对同一个脚本使用较小尺寸的 GPU ，它6.20 GiB reserved in total by PyTorch 会显示 Pytorch 中的预留如何工作以及为什么预留内存会根据 GPU 尺寸而变化？

为了解决错误消息，RuntimeError: CUDA out of memory. Tried to allocate 2.86 GiB (GPU 0; 10.92 GiB total capacity; 9.02 GiB already allocated; 1.29 GiB free; 9.06 GiB reserved in total by PyTorch)我尝试将批量大小从 10 减少到 5 到 3。我尝试使用del x_train1. 我也试过使用torch.cuda.empty_cache(). with torch.no_grad()在应用预x_train1 = bert_model(train_indices)[2]训练模型以及训练和验证新模型时，我也使用过。但它们都不起作用。

这是跟踪：

cuda:0
    x_train1 = bert_model(train_indices)[2]  # Models outputs are tuples
  File "/home/kosimadukwe/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/kosimadukwe/miniconda3/lib/python3.7/site-packages/transformers/modeling_bert.py", line 783, in forward
    input_ids=input_ids, position_ids=position_ids, token_type_ids=token_type_ids, inputs_embeds=inputs_embeds
  File "/home/kosimadukwe/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/kosimadukwe/miniconda3/lib/python3.7/site-packages/transformers/modeling_bert.py", line 177, in forward
    embeddings = inputs_embeds + position_embeddings + token_type_embeddings
RuntimeError: CUDA out of memory. Tried to allocate 2.86 GiB (GPU 0; 10.92 GiB total capacity; 9.02 GiB already allocated; 1.29 GiB free; 9.06 GiB reserved in total by PyTorch)

Run Code Online (Sandbox Code Playgroud)

和 nvidia-smi 输出

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.36       Driver Version: 440.36       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 00000000:3B:00.0 Off |                  N/A |
| 54%   79C    P2   233W / 250W |   8613MiB / 11178MiB |    100%   E. Process |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 108...  Off  | 00000000:AF:00.0 Off |                  N/A |
| 58%   79C    P2   247W / 250W |   4545MiB / 11178MiB |      0%   E. Process |
+-------------------------------+----------------------+----------------------+
|   2  GeForce GTX 108...  Off  | 00000000:D8:00.0 Off |                  N/A |
| 23%   29C    P0    56W / 250W |      0MiB / 11178MiB |      2%   E. Process |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0   1025219      C   /usr/pkg/bin/python3.8                      8601MiB |
|    1   1024440      C   /usr/pkg/bin/python3.8                      4535MiB |

Run Code Online (Sandbox Code Playgroud)

和

os.environ['CUDA_VISIBLE_DEVICES'] = '2'

Run Code Online (Sandbox Code Playgroud)

归档时间：	5 年，11 月前
查看次数：	2779 次
最近记录：	5 年，8 月前