RuntimeError: 试图再次向后浏览图表,但缓冲区已被释放。指定retain_graph=True

Bha*_*sai 2 python machine-learning deep-learning pytorch

我是 Python 和 PyTorch 的学生和初学者。我有一个非常基本的神经网络,我遇到了提到的 RunTimeError。重现错误的代码是这样的:

import torch 
from torch import nn
from torch import optim
import torch.nn.functional as F
import matplotlib.pyplot as plt

# Ensure Reproducibility
torch.manual_seed(0)

# Data Generation
x = torch.randn((100,1), requires_grad = True)
y = 1 + 2 * x + 0.3 * torch.randn(100,1)
# Shuffles the indices
idx = np.arange(100)
np.random.shuffle(idx)

# Uses first 80 random indices for train
train_idx = idx[:70]
# Uses the remaining indices for validation
val_idx = idx[70:]

# Generates train and validation sets
x_train, y_train = x[train_idx], y[train_idx]
x_val, y_val = x[val_idx], y[val_idx]

class OurFirstNeuralNetwork(nn.Module):
    def __init__(self):
        super(OurFirstNeuralNetwork, self).__init__()
        # Here we "define" our Neural Network Architecture
        self.fc1 = nn.Linear(1, 5)
        self.non_linearity_fc1 = nn.ReLU()
        self.fc2 = nn.Linear(5,1)
        #self.non_linearity_fc2 = nn.ReLU()

    def forward(self, x):
        # The forward pass
        # Here we define how activations "flow" between neurons. We've already discussed the "Sum" and "Transformation" steps of the forward pass.
        sum_fc1 = self.fc1(x)
        transformation_fc1 = self.non_linearity_fc1(sum_fc1)
        sum_fc2 = self.fc2(transformation_fc1)
        #transformation_fc2 = self.non_linearity_fc2(sum_fc2)
        # The transformation_fc2 is also the output of our model which symbolises the end of our forward pass. 
        return sum_fc2

# Instantiate the model and train

model = OurFirstNeuralNetwork()
print(model)
print(model.state_dict())
n_epochs = 1000
loss_fn = nn.MSELoss(reduction='mean')
optimizer = optim.Adam(model.parameters())

for epoch in range(n_epochs):


    model.train()
    optimizer.zero_grad()
    prediction = model(x_train)
    loss = loss_fn(y_train, prediction)
    print(epoch, loss)
    loss.backward(retain_graph=True)    
    optimizer.step()


print(model.state_dict())
Run Code Online (Sandbox Code Playgroud)

一切都是基本和标准的,这很好用。

但是,当我取出“retain_graph=True”参数时,它会抛出 RunTimeError。通过阅读各种论坛,我了解到这与第一次迭代后图形被丢弃有关,但我看过许多教程和博客,loss.backward()特别是因为它节省了内存。但我无法从概念上理解为什么它对我不起作用。

如果我提出问题的方式不是预期的格式,我们将不胜感激,并表示歉意。我对反馈持开放态度,并将有义务提供更多细节或重新表述问题,以便每个人都更容易理解。先感谢您!

Uma*_*pta 6

您需要添加optimizer.zero_grad()afteroptimizer.step()以将梯度归零。

为什么你需要这样做?

执行时,loss.backward()torch 将计算参数的梯度并更新参数的.grad属性。当你这样做时optimizer.step(),参数使用.grad属性更新,即`parameter = parameter - lr*parameter.grad。

由于您没有清除梯度并第二次向后调用,它将计算dl/d(updated param)需要通过paramter.grad第一次反向传播的情况。向后执行时,不会存储此梯度的计算图,因此您必须通过retain_graph= True以消除错误。但是,我们不想这样做来更新参数。相反,我们想要清除梯度,并以新的计算图重新开始,因此您需要通过.zero_grad调用将梯度归零。

另请参阅为什么我们需要在 PyTorch 中调用 zero_grad()?

  • 感谢您的解释,Umang 非常棒,而且绝对有道理,我希望它能起作用。但是,如果没有 keep_graph=True 参数,它仍然会抛出完全相同的错误。使用retain_graph=True,它工作得很好。 (2认同)
  • 明白了乌芒!非常感谢您的及时回复和清晰的解释! (2认同)