torch 无法在 GPU 上分配小尺寸张量（< 1GB），但可以在 databricks 上为同一节点上具有 400+ GB 内存的 CPU 分配

我的问题与我的上一个问题相关，pytorch 在 cpu 和 gpu 上为小尺寸张量分配内存，但在超过 400 GB 的节点上出现错误。但是，它是不同的，所以我创建了一个新线程。

在这个问题中，我改变了输入张量大小的大小。

 import torch
 from torch import nn
 import numpy as np
 num_embedding, num_dim = 14000, 300
 embedding = nn.Embedding(num_embedding, num_dim)

 row, col = 8000, 302
 t = [[x for x in range(col)] for _ in range(row)] 
 
 t1 = torch.tensor(t)
 print(t1.shape)  # torch.Size([8000, 302])

 type(t1), t1.device, (t1.nelement() * t1.element_size())/(1024**3) # (torch.Tensor, device(type='cpu'), 0.01800060272216797)

 tt = embedding(t1)
 embedding.forward(t1)

 t2 = t1.cuda()
 t2.device, t2.shape, t2.grad, t2.nelement(), t2.element_size(), (t2.nelement() * t2.element_size())/(1024**3) # (device(type='cuda', index=0), torch.Size([8000, 302]), None, 2416000, 8, 0.01800060272216797)

 embedding_cuda = embedding.cuda()
 torch.cuda.empty_cache()
 embedding_cuda(t2) # RuntimeError: CUDA out of memory. Tried to allocate 2.70 GiB (GPU 0; 11.17 GiB total capacity; 7.19 GiB already allocated; 2.01 GiB free; 8.88 GiB reserved in total by PyTorch)

Run Code Online (Sandbox Code Playgroud)

为什么小尺寸张量（0.018 GB）可以分配给cpu，但不能分配给同一节点（p2.8xlarge）上的gpu？为什么它需要 2.7 GB，至少比原始大小大 100 倍？

我已经检查了https://stackoverflow.com/search?q=RuntimeError%3A+CUDA+out+of+memory.+Tried+to+allocate+GiB上的大多数帖子，但是，他们都不能帮助我解决这个问题。

归档时间：	4 年，11 月前
查看次数：	595 次
最近记录：	4 年，10 月前

torch 无法在 GPU 上分配小尺寸张量（&lt; 1GB），但可以在 databricks 上为同一节点上具有 400+ GB 内存的 CPU 分配

torch 无法在 GPU 上分配小尺寸张量（< 1GB），但可以在 databricks 上为同一节点上具有 400+ GB 内存的 CPU 分配