如何在pytorch中使用多个GPU训练模型？

Question

如何在pytorch中使用多个GPU训练模型？

我的服务器有两个GPU，如何同时使用两个GPU进行训练，以最大限度地发挥其计算能力？我下面的代码正确吗？它能让我的模型得到正确的训练吗？

class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        self.bert = pretrained_model
        # for param in self.bert.parameters():
        #     param.requires_grad = True
        self.linear = nn.Linear(2048, 4)


    #def forward(self, input_ids, token_type_ids, attention_mask):
    def forward(self, input_ids, attention_mask):
        batch = input_ids.size(0)
        #output = self.bert(input_ids, token_type_ids, attention_mask).pooler_output
        output = self.bert(input_ids, attention_mask).last_hidden_state
        print('last_hidden_state',output.shape) # torch.Size([1, 768]) 
        #output = output.view(batch, -1) #
        output = output[:,-1,:]#(batch_size, hidden_size*2)(batch_size,1024)
        output = self.linear(output)
        return output

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
if torch.cuda.device_count() > 1:
    print("Use", torch.cuda.device_count(), 'gpus')
    model = MyModel()
    model = nn.DataParallel(model)
    model = model.to(device)

Run Code Online (Sandbox Code Playgroud)

Answer 1

Maz*_*zen 8

在多个 GPU 上进行训练有两种不同的方法：

数据并行 = 将无法放入单个 GPU 内存的大批次拆分为多个 GPU，因此每个 GPU 都会处理可以放入其 GPU 的小批次
模型并行 = 将模型内的层分割到不同的设备中管理和处理起来有点棘手。

请参阅这篇文章了解更多信息

要在纯 PyTorch 中进行数据并行，请参考我不久前创建的这个示例，以回顾 PyTorch 的最新更改（截至今天，1.12）。

要利用其他库进行多 GPU 训练而不需要设计很多东西，我建议使用PyTorch Lightning，因为它具有简单的 API 和良好的文档来学习如何使用数据并行性进行多 GPU 训练。

更新：2022/10/25

以下视频详细解释了不同类型的分布式训练：https://youtu.be/BPYOsDCZbno ?t=1011

归档时间：	3 年，3 月前
查看次数：	5859 次
最近记录：	3 年前