Sea*_*ake 0 python-3.x pytorch autograd
我正在研究有关定义新的autograd函数的PyTorch教程。我要实现的autograd函数是一个包装器torch.nn.functional.max_pool1d。这是我到目前为止的内容:
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.autograd as tag
class SquareAndMaxPool1d(tag.Function):
@staticmethod
def forward(ctx, input, kernel_size, stride=None, padding=0, dilation=1, \
return_indices=False, ceil_mode=False):
ctx.save_for_backward( input )
inputC = input.clone() #copy input
inputC *= inputC
output = F.max_pool1d(inputC, kernel_size, stride=stride, \
padding=padding, dilation=dilation, \
return_indices=return_indices, \
ceil_mode=ceil_mode)
return output
@staticmethod
def backward(ctx, grad_output):
input, = ctx.saved_tensors
grad_input = get_max_pool1d_grad_somehow(grad_output)
return 2.0*input*grad_input
Run Code Online (Sandbox Code Playgroud)
我的问题是:如何获得包装函数的梯度?我知道给出的示例很简单,但可能还有其他方法可以执行此操作,但是我想做的事情适合此框架,并且需要我实现一个autograd功能。
编辑:检查了此博客文章后,我决定尝试以下操作backward:
def backward(ctx, grad_output):
input, output = ctx.saved_tensors
grad_input = output.backward(grad_output)
return 2.0*input*grad_input
Run Code Online (Sandbox Code Playgroud)
与output添加到已保存的变量中。然后,我运行以下代码:
x = np.random.randn(1,1,5)
xT = torch.from_numpy(x)
xT.requires_grad=True
f = SquareAndMaxPool1d.apply
s = torch.sum(f(xT,2))
s.backward()
Run Code Online (Sandbox Code Playgroud)
我知道了Bus error: 10。
我说,xT是tensor([[[ 1.69533562, -0.21779421, 2.28693953, -0.86688095, -1.01033497]]], dtype=torch.float64)的话,我希望找到那个xT.grad是 tensor([[[ 3.39067124, -0. , 9.14775812, -0. , -2.02066994]]], dtype=torch.float64)调用后s.backward()(即2*x*grad_of_max_pool,用grad_of_max_pool含tensor([[[1., 0., 2., 0., 1.]]], dtype=torch.float64))。
我已经弄清楚了为什么得到一个Bus error: 10。似乎上述代码导致了backwardat 的递归调用grad_input = output.backward(grad_output)。因此,我需要找到其他方法来获得的梯度max_pool1d。我知道如何在纯Python中实现这一点,但结果要比包装库代码慢得多。
您选择了一个倒霉的例子。torch.nn.functional.max_pool1d不是的实例torch.autograd.Function,因为它是内置的PyTorch,使用C ++代码定义并带有自动生成的 Python绑定。我不确定是否可以backward通过其接口获取该属性。
首先,如果您没有注意到,则无需为该公式的反向传播编写任何自定义代码,因为幂运算和max_pool1d已经定义了它,因此autograd也涵盖了它们的组成。假设你的目标是一个锻炼,我建议你更多做手工(不回落到backward的max_pool1d)。下面是一个例子
import torch
import torch.nn.functional as F
import torch.autograd as tag
class SquareAndMaxPool1d(tag.Function):
@staticmethod
def forward(ctx, input, kernel_size, **kwargs):
# we're gonna need indices for backward. Currently SquareAnd...
# never actually returns indices, I left it out for simplicity
kwargs['return_indices'] = True
input_sqr = input ** 2
output, indices = F.max_pool1d(input_sqr, kernel_size, **kwargs)
ctx.save_for_backward(input, indices)
return output
@staticmethod
def backward(ctx, grad_output):
input, indices = ctx.saved_tensors
# first we need to reconstruct the gradient of `max_pool1d`
# by putting all the output gradient elements (corresponding to
# input elements which made it through the max_pool1d) in their
# respective places, the rest has gradient of 0. We do it by
# scattering it against a tensor of 0s
grad_output_unpooled = torch.zeros_like(input)
grad_output_unpooled.scatter_(2, indices, grad_output)
# then incorporate the gradient of the "square" part of your
# operator
grad_input = 2. * input * grad_output_unpooled
# the docs for backward
# https://pytorch.org/docs/stable/autograd.html#torch.autograd.Function.backward
# say that "it should return as many tensors, as there were inputs
# to forward()". It fails to mention that if an argument was not a
# tensor, it should return None (I remember reading this somewhere,
# but can't find it anymore). Anyway, we need to
# return a (grad_input, None) tuple to avoid a complaint that two
# outputs were expected
return grad_input, None
Run Code Online (Sandbox Code Playgroud)
然后,我们可以使用数值梯度检查器来验证操作是否按预期进行。
f = SquareAndMaxPool1d.apply
xT = torch.randn(1, 1, 6, requires_grad=True, dtype=torch.float64)
tag.gradcheck(lambda t: f(t, 2), xT)
Run Code Online (Sandbox Code Playgroud)
我很抱歉,如果这不解决您如何得到这个问题backward的max_pool1d,但我希望你觉得我的回答有用就够了。
递归调用遇到的问题实际上来自output,并且默认情况下 ,with no_grad它似乎是从 继承的类声明中的默认行为torch.autograd.Function。如果您签output.grad_fn入forward,它可能会是None,并且在backward,它可能会链接到函数对象<SquareAndMaxPool1d...>,从而导致递归调用。如果您仍然对如何完全按照您的要求进行操作感兴趣,请参阅以下示例F.linear:
import torch
import torch.nn.functional as F
class custom_Linear(nn.Linear):
def forward(self, _input):
return Custom_Linear_AGfn_getAround.apply(_input, self.weight, self.bias)
class Custom_Linear_AGfn_getAround(torch.autograd.Function):
@staticmethod
def forward(ctx, _input, _weight, _bias):
print('Custom forward')
with torch.enable_grad():
detached_input = _input.detach()
detached_input.requires_grad_(True)
detached_weight = _weight.detach()
detached_weight.requires_grad_(True)
detached_bias = _bias.detach()
detached_bias.requires_grad_(True)
_tmp = F.linear(detached_input, detached_weight, detached_bias)
ctx.saved_input = detached_input
ctx.saved_param = detached_weight, detached_bias
ctx.save_for_backward(_tmp)
_output = _tmp.detach()
return _output
@staticmethod
def backward(ctx, grad_out):
print('Custom backward')
_tmp, = ctx.saved_tensors
_weight, _bias = ctx.saved_param
detached_input = ctx.saved_input
with torch.enable_grad():
_tmp.backward(grad_out)
return detached_input.grad, _weight.grad, _bias.grad
Run Code Online (Sandbox Code Playgroud)
基本上,它只是为感兴趣的部分构建一个小的孤立图,而不会弄乱主图,并在查看要分离的内容以及孤立图需要什么时使用grad_fn并跟踪这些图。requires_grad
关于棘手的部分:
_weight并_bias通过save_for_backward,就会有_weight.grad,_bias.grad就像None在里面一样backward,但是一旦在外面_weight.grad,_bias.grad就会有它们的正确值,或者你通过一个属性传递它们ctx.saved_param,在这种情况下,你会必须手动输入(return )None的最后两个返回值,否则当您之后检查向后之外的权重和偏差梯度时,您将获得两倍的正确值。backwarddetached_input.grad, None, Nonebackward对于forward继承类来说,默认情况下torch.autograd.Function似乎有一种with no_grad行为。因此,with torch.enable_grad():在上面的代码中删除将导致(无法理解为什么默认情况下必须和_tmp.grad_fn,None尽管_tmp需要grad_fn梯度,直到我碰到: https: //github.com/pytorch/pytorch/issues/7698)Nonerequires_gradFalseforwarddetached_inputgrad_fn,_output因为当我没有并且with torch.enable_grad()不分离输出时,导致_tmp.grad_fn向前为无,它确实<Custom_Linear_AGfn_getAround...> grad_fn在backward(和结果在无限递归调用中)。| 归档时间: |
|
| 查看次数: |
721 次 |
| 最近记录: |