pytorch 中的 autograd 可以处理同一模块中层的重复使用吗？

Question

pytorch 中的 autograd 可以处理同一模块中层的重复使用吗？

ihd*_*hdv 6 python neural-network pytorch autograd

假设我layer在 Torch 模块中有一个层，并在一个forward步骤中使用它两次或更多次，这样输出的结果layer稍后会再次输入到相同的layer. pytorch 能否正确autograd计算该层权重的梯度？

这是我正在谈论的内容：

import torch
import torch.nn as nn
import torch.nn.functional as F

class net(nn.Module):
    def __init__(self,in_dim,out_dim):
        super(net,self).__init__()
        self.layer = nn.Linear(in_dim,out_dim,bias=False)

    def forward(self,x):
        x = self.layer(x)
        x = self.layer(x)
        return x

input_x = torch.tensor([10.])
label = torch.tensor([5.])
n = net(1,1)
loss_fn = nn.MSELoss()

out = n(input_x)
loss = loss_fn(out,label)
n.zero_grad()
loss.backward()

for param in n.parameters():
    w = param.item()
    g = param.grad

print('Input = %.4f; label = %.4f'%(input_x,label))
print('Weight = %.4f; output = %.4f'%(w,out))
print('Gradient w.r.t. the weight is %.4f'%(g))
print('And it should be %.4f'%(4*(w**2*input_x-label)*w*input_x))

Run Code Online (Sandbox Code Playgroud)

并且输出是（如果权重的初始值不同，您的计算机上可能会有所不同）：

Input = 10.0000; label = 5.0000
Weight = 0.9472; output = 8.9717
Gradient w.r.t. the weight is 150.4767
And it should be 150.4766

Run Code Online (Sandbox Code Playgroud)

在这个例子中，我定义了一个只有一个线性层（in_dim=out_dim=1并且没有偏差）的模块。w是这一层的权重；input_x是输入值；label是所需的值。由于损失被选为 MSE，损失的公式为

((w^2)*input_x-label)^2

手工计算，我们有

dw/dx = 2*((w^2)*input_x-label)*(2*w*input_x)

我上面例子的输出表明，它autograd给出了与手工计算相同的结果，让我有理由相信它可以在这种情况下工作。但在实际应用中，该层可能有更高维度的输入和输出，其后还有一个非线性激活函数，而神经网络可能有多个层。

我想问的是：我可以信任autograd处理这种情况，但比我的例子复杂得多吗？当一个层被迭代调用时它是如何工作的？

Answer 1

a_g*_*est 6

这将工作得很好。从 autograd 引擎的角度来看，这不是一个循环应用程序，因为生成的计算图会将重复计算展开为线性序列。为了说明这一点，对于单个图层，您可能有：

x -----> layer --------+
           ^           |
           |  2 times  |
           +-----------+

Run Code Online (Sandbox Code Playgroud)

从 autograd 的角度来看，这看起来像：

x ---> layer ---> layer ---> layer

Run Code Online (Sandbox Code Playgroud)

这layer是在图形上复制了 3 次的同一层。这意味着在计算层权重的梯度时，它们将从所有三个阶段累积。所以当使用backward：

x ---> layer ---> layer ---> layer ---> loss_func
                                            |
       lback <--- lback <--- lback <--------+
         |          |          |
         |          v          |
         +------> weights <----+
                   _grad

Run Code Online (Sandbox Code Playgroud)

这里lback表示layer使用上游梯度作为输入的正向变换的局部导数。每一个都添加到图层的weights_grad.

循环神经网络在其基础上使用层（单元）的这种重复应用。例如，请参阅有关使用字符级 RNN对名称进行分类的教程。

归档时间：	5 年，8 月前
查看次数：	1081 次
最近记录：	5 年，8 月前