pytorch中的层标准化?

Kru*_*ger 10 nlp machine-learning deep-learning pytorch

层标准化不应该是x = torch.tensor([[1.5,0,0,0,0]])[[1.5,-0.5,-0.5,-0.5]]?根据本文pytorch doc中的方程。但torch.nn.LayerNorm给予[[ 1.7320, -0.5773, -0.5773, -0.5773]]

这是示例代码:

x = torch.tensor([[1.5,.0,.0,.0]])
layerNorm = torch.nn.LayerNorm(4, elementwise_affine = False)

y1 = layerNorm(x)

mean = x.mean(-1, keepdim = True)
var = x.var(-1, keepdim = True)
y2 = (x-mean)/torch.sqrt(var+layerNorm.eps)
Run Code Online (Sandbox Code Playgroud)

在哪里:

y1 == tensor([[ 1.7320, -0.5773, -0.5773, -0.5773]])
y2 == tensor([[ 1.5000, -0.5000, -0.5000, -0.5000]])
Run Code Online (Sandbox Code Playgroud)

小智 5

代替

var = x.var(-1, keepdim = True)
Run Code Online (Sandbox Code Playgroud)

你应该使用

var = x.var(-1, keepdim = True, unbiased=False)
Run Code Online (Sandbox Code Playgroud)

这将产生与 pytorch 相同的结果,完整代码:

x = torch.tensor([[1.5,.0,.0,.0]])
layerNorm = torch.nn.LayerNorm(4, elementwise_affine = False)
y1 = layerNorm(x)
mean = x.mean(-1, keepdim = True)
var = x.var(-1, keepdim = True, unbiased=False)
y2 = (x-mean)/torch.sqrt(var+layerNorm.eps)
Run Code Online (Sandbox Code Playgroud)


Kru*_*ger 1

显然,代码应该是这样的:

...
var = x.mean((x-mean)**2, -1, keepdim = True)
...
Run Code Online (Sandbox Code Playgroud)

希望这对任何遇到同样错误的人都有帮助。

  • 我想说最容易遵循的实现可以在这里看到(https://github.com/pytorch/pytorch/blob/066e3ed953dd0bb0f3ca4889bbb7835675afb11f/aten/src/ATen/native/cpu/layer_norm_kernel.cpp#L16-L67 )。`rstd` 通常代表“相对标准差”。 (2认同)