我应该在神经网络中转置特征或权重吗？

Question

我应该在神经网络中转置特征或权重吗？

Fra*_*nva 5 python machine-learning neural-network pytorch

我正在学习神经网络。

这是完整的代码： https://github.com/udacity/deep-learning-v2-pytorch/blob/master/intro-to-pytorch/Part%201%20-%20Tensors%20in%20PyTorch%20 （练习）.ipynb

当我转置特征时，我得到以下输出：

import torch
def activation(x):
    return 1/(1+torch.exp(-x))

### Generate some data
torch.manual_seed(7) # Set the random seed so things are predictable

# Features are 5 random normal variables
features = torch.randn((1, 5))
# True weights for our data, random normal variables again
weights = torch.randn_like(features)
# and a true bias term
bias = torch.randn((1, 1))

product = features.t() * weights + bias
output = activation(product.sum())

Run Code Online (Sandbox Code Playgroud)

张量(0.9897)

但是，如果我转置权重，我会得到不同的输出：

weights_prime = weights.view(5,1)
prod = torch.mm(features, weights_prime) + bias
y_hat = activation(prod.sum())

Run Code Online (Sandbox Code Playgroud)

张量(0.1595)

为什么会出现这种情况？

更新

我查看了解决方案： https://github.com/udacity/deep-learning-v2-pytorch/blob/master/intro-to-pytorch/Part%201%20-%20Tensors%20in%20PyTorch%20 （解决方案）.ipynb

我看到了这个：

y = activation((features * weights).sum() + bias)

为什么矩阵特征（1,5）可以乘以另一个矩阵权重（1,5）而不首先转置权重？

更新2

读了几篇文章后，我意识到

MatrixA * MatrixB 与 torch.mm(matrixA,matrixB) 和 torch.matmul(matrixA,matrixB) 不同。

有人可以证实我之间的三种理解吗？

因此 * 表示按元素乘法，而 torch.mm() 和 torch.matmul() 是按矩阵乘法。
torch.mm() 和 torch.matmul() 之间的区别： mm() 专门用于二维矩阵，而 matmul() 可用于更复杂的情况。
在我上面链接中提到的 Udacity 编码练习的 Neutral Network 中，它需要逐元素乘法。

更新3

只是为了给有同样困惑的人带来视频截图：

这是视频链接：https://www.youtube.com/watch? time_continue=98&v=6Z7WntXays8&feature=emb_logo

Answer 1

Ale*_*x I 1

查看https://pytorch.org/docs/master/ generated/torch.nn.Linear.html

\n

torch 中典型的线性（全连接）层使用 shape 的输入特征 (N,\xe2\x88\x97,in_features)和 shape 的权重(out_features,in_features) 来生成 shape 的输出(N,*,out_features)。这里 N 是批量大小，* 是任意数量的其他维度（可能没有）。

\n

其实现是：

\n

output = input.matmul(weight.t())\n

Run Code Online (Sandbox Code Playgroud)\n

所以，答案是，按照惯例，你的两个公式都不正确；标准公式是上面的公式。

\n

您可以使用非标准形状，因为您是从头开始实现的；只要它是一致的，它就可以工作，但我不推荐它用于学习。目前尚不清楚代码中的 1 和 5 是什么，但大概您需要 5 个输入特征和 1 个输出特征，并且批量大小也为 1。在这种情况下，标准形状应为 input =（torch.randn((1, 5))批量大小 = 1 和 in_features = 5），权重 =（torch.randn((5, 1))in_features = 5 和 out_features = 1）。

\n

权重没有理由应该与特征具有相同的形状；因此weights = torch.randn_like(features)没有意义。

\n

最后，针对您的实际问题：

\n

“我应该在神经网络中转置特征或权重吗？” - 在火炬约定中，您应该转置权重，但首先将 matmul 与特征一起使用。其他框架可能有不同的约定；只要权重的 in_features 维度乘以输入的 num_features 维度，它就可以工作。

\n

“为什么会这样？” - 这是两种完全不同的计算；没有理由认为它们会产生相同的结果。

\n

“因此 * 表示按元素乘法，而 torch.mm() 和 torch.matmul() 是按矩阵乘法。” - 是的; mm 仅是矩阵-矩阵，matmul 是向量-矩阵或矩阵-矩阵，包括相同的批处理版本 - 检查文档以了解 matmul 可以执行的所有操作（有点多）。

\n

“torch.mm() 和 torch.matmul() 之间的区别：mm() 专门用于二维矩阵，而 matmul() 可用于更复杂的情况。” - 是的; 最大的区别是 matmul 可以广播。当您有明确意图时使用它；使用mm来防止无意的广播。

\n

“在我上面链接中提到的 Udacity 编码练习的 Neutral Network 中，它需要按元素相乘。” - 我对此表示怀疑; 这可能是 Udacity 代码中的错误。这段代码weights = torch.randn_like(features)在任何情况下看起来都像是一个错误；权重的维度与特征的维度具有不同的含义。

\n

归档时间：	5 年，3 月前
查看次数：	3094 次
最近记录：	5 年，3 月前