即使所有变量的 requires_grad = False,PyTorch 损失也会减少

Kar*_*rus 2 pytorch

当我用PyTorch创建神经网络时,使用torch.nn.Sequential定义层的方法,似乎参数requires_grad = False是默认的。然而,当我训练这个网络时,损失减少了。如果层没有通过渐变更新,这怎么可能?

例如,这是定义我的网络的代码:

class Network(torch.nn.Module):

def __init__(self):
    super(Network, self).__init__()
    self.layers = torch.nn.Sequential(
        torch.nn.Linear(10, 5),
        torch.nn.Linear(5, 2)
    )
    print('Network Parameters:')
    model_dict = self.state_dict()
    for param_name in model_dict:
        param = model_dict[param_name]
        print('Name: ' + str(param_name))
        print('\tRequires Grad: ' + str(param.requires_grad))

def forward(self, input):
    prediction = self.layers(input)
    return prediction
Run Code Online (Sandbox Code Playgroud)

这打印出来:

Network Parameters:
Name: layers.0.weight
    Requires Grad: False
Name: layers.0.bias
    Requires Grad: False
Name: layers.1.weight
    Requires Grad: False
Name: layers.1.bias
    Requires Grad: False
Run Code Online (Sandbox Code Playgroud)

然后这是训练我的网络的代码:

network = Network()
network.train()
optimiser = torch.optim.SGD(network.parameters(), lr=0.001)
criterion = torch.nn.MSELoss()
inputs = np.random.random([100, 10]).astype(np.float32)
inputs = torch.from_numpy(inputs)
labels = np.random.random([100, 2]).astype(np.float32)
labels = torch.from_numpy(labels)


while True:
    prediction = network.forward(inputs)
    loss = criterion(prediction, labels)
    print('loss = ' + str(loss.item()))
    optimiser.zero_grad()
    loss.backward()
    optimiser.step()
Run Code Online (Sandbox Code Playgroud)

这打印出来:

loss = 0.284633219242
loss = 0.278225809336
loss = 0.271959483624
loss = 0.265835255384
loss = 0.259853869677
loss = 0.254015892744
loss = 0.248321473598
loss = 0.242770522833
loss = 0.237362638116
loss = 0.232097044587
loss = 0.226972639561
loss = 0.221987977624
loss = 0.217141270638
loss = 0.212430402637
loss = 0.207852959633
loss = 0.203406244516
loss = 0.199087426066
loss = 0.19489350915
loss = 0.190821439028
loss = 0.186868071556
loss = 0.183030322194
loss = 0.179305106401
loss = 0.175689414144
loss = 0.172180294991
loss = 0.168774917722
loss = 0.165470585227
loss = 0.162264674902
loss = 0.159154698253
Run Code Online (Sandbox Code Playgroud)

如果所有参数都有 ,为什么损失会减少requires_grad = False

the*_*dch 5

这很有趣——state_dict()和之间似乎有区别parameters()

class Network(torch.nn.Module):
    def __init__(self):
        super(Network, self).__init__()
        self.layers = torch.nn.Sequential(
            torch.nn.Linear(10, 5),
            torch.nn.Linear(5, 2)
        )
        print(self.layers[0].weight.requires_grad) # True
        print(self.state_dict()['layers.0.weight'].requires_grad) # False
        print(list(self.parameters())[0].requires_grad) # True

    def forward(self, input):
        prediction = self.layers(input)
        return prediction
Run Code Online (Sandbox Code Playgroud)

所以看起来你的损失正在减少,因为网络实际上正在学习,因为requires_grad是真的。(通常对于调试,我更喜欢查询实际对象 ( self.layers[0]...)。

[编辑] 啊哈 - 发现问题:有一个keep_vars布尔选项可以传递给state_dict它执行以下操作(除其他外):(https://github.com/pytorch/pytorch/blob/master/torch/nn/modules /module.py#L665 )

for name, param in self._parameters.items():
    if param is not None:
        destination[prefix + name] = param if keep_vars else param.data
Run Code Online (Sandbox Code Playgroud)

因此,如果您想要实际的param,请使用keep_vars=True- 如果您只想要数据,请使用默认的keep_vars=False.

所以:

print(self.layers[0].weight.requires_grad) # True
print(self.state_dict(keep_vars=True)['layers.0.weight'].requires_grad) # True
print(list(self.parameters())[0].requires_grad) # True
Run Code Online (Sandbox Code Playgroud)

  • 总结一下:OP检查`.requires_grad`的方法(使用`.state_dict()`)是不正确的,并且`.requires_grad`实际上对于所有参数来说都是`True`。要获得正确的“.requires_grad”,可以使用“.parameters()”或直接访问“layer.weight”或将“keep_vars=True”传递给“state_dict()”。 (2认同)