YQ.*_*ang 8 python normalization pytorch
我试图了解torch.nn.LayerNormnlp 模型的工作原理。假设输入数据是一批词嵌入序列:
batch_size, seq_size, dim = 2, 3, 4
embedding = torch.randn(batch_size, seq_size, dim)
print("x: ", embedding)
layer_norm = torch.nn.LayerNorm(dim)
print("y: ", layer_norm(embedding))
# outputs:
"""
x: tensor([[[ 0.5909, 0.1326, 0.8100, 0.7631],
[ 0.5831, -1.7923, -0.1453, -0.6882],
[ 1.1280, 1.6121, -1.2383, 0.2150]],
[[-0.2128, -0.5246, -0.0511, 0.2798],
[ 0.8254, 1.2262, -0.0252, -1.9972],
[-0.6092, -0.4709, -0.8038, -1.2711]]])
y: tensor([[[ 0.0626, -1.6495, 0.8810, 0.7060],
[ 1.2621, -1.4789, 0.4216, -0.2048],
[ 0.6437, 1.0897, -1.5360, -0.1973]],
[[-0.2950, -1.3698, 0.2621, 1.4027],
[ 0.6585, 0.9811, -0.0262, -1.6134],
[ 0.5934, 1.0505, -0.0497, -1.5942]]],
grad_fn=<NativeLayerNormBackward0>)
"""
Run Code Online (Sandbox Code Playgroud)
根据文档的描述,我的理解是平均值和标准差是由每个样本的所有嵌入值计算的。所以我尝试手动计算y[0, 0, :]:
mean = torch.mean(embedding[0, :, :])
std = torch.std(embedding[0, :, :])
print((embedding[0, 0, :] - mean) / std)
Run Code Online (Sandbox Code Playgroud)
这给出了tensor([ 0.4310, -0.0319, 0.6523, 0.6050])这不是正确的输出。我想知道正确的计算方法是什么y[0, 0, :]?
B20*_*011 14
Pytorch层范数表示在最后 D 维度上计算的平均值和标准差。基于此,正如我所期望的,对于层规范,(batch_size, seq_size, embedding_dim)计算应该结束(seq_size, embedding_dim)为最后 2 个维度,不包括批量暗淡。
可以在这里找到与层范数实现类似的问题和答案,Layer Normalization in pytorch? 。
\n在下面的一些论文中,它展示了 NLP 中不同层规范的应用。
\n\n\n层归一化 (LN) 沿通道维度运行
\n
\n\n\nLN 沿 (C, H, W)\na 轴计算每个样本的 \xc2\xb5 和 \xcf\x83。
\n
在NLP 3d 张量示例的pytorch 文档中,平均值和标准差仅在最后一个 dim 上计算embedding_dim。
在本文中,它显示了类似于 pytorch doc 示例,
\n\n\n\n几乎所有的NLP任务都以可变长度序列作为输入,这非常适合LN\n只计算通道维度上的统计数据,而不涉及batch和序列长度维度。
\n
另一篇论文中显示的示例,
\n\n\n\nLN 在通道/特征维度上进行标准化,如图 1 所示。
\n
import torch\n\nbatch_size, seq_size, dim = 2, 3, 4\nlast_dims = 4\n\nembedding = torch.randn(batch_size, seq_size, dim)\nprint("x: ", embedding)\n\nlayer_norm = torch.nn.LayerNorm(last_dims, elementwise_affine = False)\nlayer_norm_out = layer_norm(embedding)\nprint("y: ", layer_norm_out)\n\neps: float = 0.00001\nmean = torch.mean(embedding[0, :, :], dim=(-1), keepdim=True)\nvar = torch.square(embedding[0, :, :] - mean).mean(dim=(-1), keepdim=True)\ny_custom = (embedding[0, :, :] - mean) / torch.sqrt(var + eps)\nprint("y_custom: ", y_custom)\nassert torch.allclose(layer_norm_out[0], y_custom), \'Tensors do not match.\'\n\neps: float = 0.00001\nmean = torch.mean(embedding[1, :, :], dim=(-1), keepdim=True)\nvar = torch.square(embedding[1, :, :] - mean).mean(dim=(-1), keepdim=True)\ny_custom = (embedding[1, :, :] - mean) / torch.sqrt(var + eps)\nprint("y_custom: ", y_custom)\nassert torch.allclose(layer_norm_out[1], y_custom), \'Tensors do not match.\'\nRun Code Online (Sandbox Code Playgroud)\nx: tensor([[[-0.0594, -0.8702, -1.9837, 0.2914],\n [-0.4774, 1.0372, 0.6425, -1.1357],\n [ 0.3872, -0.9190, -0.5774, 0.3281]],\n\n [[-0.5548, 0.0815, 0.2333, 0.3569],\n [ 1.0380, -0.1756, -0.7417, 2.2930],\n [-0.0075, -0.3623, 1.9310, -0.7043]]])\ny: tensor([[[ 0.6813, -0.2454, -1.5180, 1.0822],\n [-0.5700, 1.1774, 0.7220, -1.3295],\n [ 1.0285, -1.2779, -0.6747, 0.9241]],\n\n [[-1.6638, 0.1490, 0.5814, 0.9334],\n [ 0.3720, -0.6668, -1.1513, 1.4462],\n [-0.2171, -0.5644, 1.6809, -0.8994]]])\ny_custom: tensor([[ 0.6813, -0.2454, -1.5180, 1.0822],\n [-0.5700, 1.1774, 0.7220, -1.3295],\n [ 1.0285, -1.2779, -0.6747, 0.9241]])\ny_custom: tensor([[-1.6638, 0.1490, 0.5814, 0.9334],\n [ 0.3720, -0.6668, -1.1513, 1.4462],\n [-0.2171, -0.5644, 1.6809, -0.8994]])\nRun Code Online (Sandbox Code Playgroud)\nimport torch\n\nbatch_size, c, h, w = 2, 3, 2, 4\nlast_dims = [c, h, w]\n\nembedding = torch.randn(batch_size, c, h, w)\nprint("x: ", embedding)\n\nlayer_norm = torch.nn.LayerNorm(last_dims, elementwise_affine = False)\nlayer_norm_out = layer_norm(embedding)\nprint("y: ", layer_norm_out)\n\n\neps: float = 0.00001\nmean = torch.mean(embedding[0, :, :], dim=(-3, -2, -1), keepdim=True)\nvar = torch.square(embedding[0, :, :] - mean).mean(dim=(-3, -2, -1), keepdim=True)\ny_custom = (embedding[0, :, :] - mean) / torch.sqrt(var + eps)\nprint("y_custom: ", y_custom)\nassert torch.allclose(layer_norm_out[0], y_custom), \'Tensors do not match.\'\n\neps: float = 0.00001\nmean = torch.mean(embedding[1, :, :], dim=(-3, -2, -1), keepdim=True)\nvar = torch.square(embedding[1, :, :] - mean).mean(dim=(-3, -2, -1), keepdim=True)\ny_custom = (embedding[1, :, :] - mean) / torch.sqrt(var + eps)\nprint("y_custom: ", y_custom)\nassert torch.allclose(layer_norm_out[1], y_custom), \'Tensors do not match.\'\nRun Code Online (Sandbox Code Playgroud)\nx: tensor([[[[ 1.0902, -0.8648, 1.5785, 0.3087],\n [ 0.0249, -1.3477, -0.9565, -1.5024]],\n\n [[ 1.8024, -0.2894, 0.7284, 0.7822],\n [ 1.4385, -0.2848, -0.3114, 0.4633]],\n\n [[ 0.9061, 0.3066, 0.9916, 0.9284],\n [ 0.3356, 0.9162, -0.4579, 1.0669]]],\n\n\n [[[-0.8292, 0.9111, -0.7307, -1.1003],\n [ 0.3441, -1.9823, 0.1313, 0.2048]],\n\n [[-0.2838, 0.1147, -0.1605, -0.4637],\n [-2.1343, -0.4402, 1.6685, 0.4455]],\n\n [[ 0.6895, -2.7331, 1.1693, -0.6999],\n [-0.3497, -0.2942, -0.0028, -1.3541]]]])\ny: tensor([[[[ 0.8653, -1.3279, 1.4131, -0.0114],\n [-0.3298, -1.8697, -1.4309, -2.0433]],\n\n [[ 1.6643, -0.6824, 0.4594, 0.5198],\n [ 1.2560, -0.6772, -0.7071, 0.1619]],\n\n [[ 0.6587, -0.0137, 0.7547, 0.6838],\n [ 0.0188, 0.6701, -0.8715, 0.8392]]],\n\n\n [[[-0.4938, 1.2220, -0.3967, -0.7610],\n [ 0.6629, -1.6306, 0.4531, 0.5256]],\n\n [[ 0.0439, 0.4368, 0.1655, -0.1335],\n [-1.7805, -0.1103, 1.9686, 0.7629]],\n\n [[ 1.0035, -2.3707, 1.4764, -0.3663],\n [-0.0211, 0.0337, 0.3210, -1.0112]]]])\ny_custom: tensor([[[ 0.8653, -1.3279, 1.4131, -0.0114],\n [-0.3298, -1.8697, -1.4309, -2.0433]],\n\n [[ 1.6643, -0.6824, 0.4594, 0.5198],\n [ 1.2560, -0.6772, -0.7071, 0.1619]],\n\n [[ 0.6587, -0.0137, 0.7547, 0.6838],\n [ 0.0188, 0.6701, -0.8715, 0.8392]]])\ny_custom: tensor([[[-0.4938, 1.2220, -0.3967, -0.7610],\n [ 0.6629, -1.6306, 0.4531, 0.5256]],\n\n [[ 0.0439, 0.4368, 0.1655, -0.1335],\n [-1.7805, -0.1103, 1.9686, 0.7629]],\n\n [[ 1.0035, -2.3707, 1.4764, -0.3663],\n [-0.0211, 0.0337, 0.3210, -1.0112]]])\nRun Code Online (Sandbox Code Playgroud)\nfrom typing import Union, List\n\nimport torch\n\n\nbatch_size, seq_size, embed_dim = 2, 3, 4\nembedding = torch.randn(batch_size, seq_size, embed_dim)\nprint("x: ", embedding)\nprint(embedding.shape)\nprint()\n\n\nlayer_norm = torch.nn.LayerNorm(embed_dim, elementwise_affine=False)\nlayer_norm_output = layer_norm(embedding)\nprint("y: ", layer_norm_output)\nprint(layer_norm_output.shape)\nprint()\n\n\ndef custom_layer_norm(\n x: torch.Tensor, dim: Union[int, List[int]] = -1, eps: float = 0.00001\n) -> torch.Tensor:\n mean = torch.mean(x, dim=(dim,), keepdim=True)\n var = torch.square(x - mean).mean(dim=(dim,), keepdim=True)\n return (x - mean) / torch.sqrt(var + eps)\n\n\ncustom_layer_norm_output = custom_layer_norm(embedding)\nprint("y_custom: ", custom_layer_norm_output)\nprint(custom_layer_norm_output.shape)\n\nassert torch.allclose(layer_norm_output, custom_layer_norm_output), \'Tensors do not match.\'\nRun Code Online (Sandbox Code Playgroud)\nx: tensor([[[-0.4808, -0.1981, 0.4538, -1.2653],\n [ 0.3578, 0.6592, 0.2161, 0.3852],\n [ 1.2184, -0.4238, -0.3415, -0.3487]],\n\n [[ 0.9874, -1.7737, 0.1886, 0.0448],\n [-0.5162, 0.7872, -0.3433, -0.3266],\n [-0.5459, -0.0371, 1.2625, -1.6030]]])\ntorch.Size([2, 3, 4])\n\ny: tensor([[[-0.1755, 0.2829, 1.3397, -1.4471],\n [-0.2916, 1.5871, -1.1747, -0.1208],\n [ 1.7301, -0.6528, -0.5334, -0.5439]],\n\n [[ 1.1142, -1.6189, 0.3235, 0.1812],\n [-0.8048, 1.7141, -0.4709, -0.4384],\n [-0.3057, 0.1880, 1.4489, -1.3312]]])\ntorch.Size([2, 3, 4])\n\ny_custom: tensor([[[-0.1755, 0.2829, 1.3397, -1.4471],\n [-0.2916, 1.5871, -1.1747, -0.1208],\n [ 1.7301, -0.6528, -0.5334, -0.5439]],\n\n [[ 1.1142, -1.6189, 0.3235, 0.1812],\n [-0.8048, 1.7141, -0.4709, -0.4384],\n [-0.3057, 0.1880, 1.4489, -1.3312]]])\ntorch.Size([2, 3, 4])\nRun Code Online (Sandbox Code Playgroud)\n
| 归档时间: |
|
| 查看次数: |
13387 次 |
| 最近记录: |