为什么 tf.matmul(a,b, transpose_b=True) 有效，但 tf.matmul(a, tf.transpose(b)) 无效？

Question

为什么 tf.matmul(a,b, transpose_b=True) 有效，但 tf.matmul(a, tf.transpose(b)) 无效？

use*_*614 5 python linear-algebra matrix-multiplication deep-learning tensorflow

代码：

x = tf.constant([1.,2.,3.], shape = (3,2,4))
y = tf.constant([1.,2.,3.], shape = (3,21,4))
tf.matmul(x,y)                     # Doesn't work. 
tf.matmul(x,y,transpose_b = True)  # This works. Shape is (3,2,21)
tf.matmul(x,tf.transpose(y))       # Doesn't work.

Run Code Online (Sandbox Code Playgroud)

我想知道y里面变成了什么形状，tf.matmul(x,y,transpose_b = True)这样我就可以通过注意力来弄清楚 LSTM 内部到底发生了什么。

Answer 1

Max*_*xim 7

对于秩 > 2 的张量，转置可以有不同的定义，这里的区别在于转置为tf.transpose和的轴tf.matmul(..., transpose_b=True)转置的轴。

默认情况下，tf.transpose这样做：

返回的张量的维度i将对应于输入的维度perm[i]。如果未给出 perm，则将其设置为(n-1...0)，其中 n 是输入张量的秩。因此，默认情况下，此操作对二维输入张量执行常规矩阵转置。

因此，在您的情况下，它将转换y为 shape 的张量(4, 21, 3)，这与x见下文）。

但如果你设置了perm=[0, 2, 1]，结果是兼容的：

# Works! (3, 2, 4) * (3, 4, 21) -> (3, 2, 21).
tf.matmul(x, tf.transpose(y, [0, 2, 1]))

Run Code Online (Sandbox Code Playgroud)

关于`tf.matmul`

您可以计算点积：(a, b, c) * (a, c, d) -> (a, b, d)。但它不是张量点积——它是批量操作（参见这个问题）。

在本例中，a被视为批量大小，因此tf.matmul计算a矩阵的点积(b, c) * (c, d)。

批次可以不止一维，因此这也是有效的：

(a, b, c, d) * (a, b, d, e) -> (a, b, c, e)

Run Code Online (Sandbox Code Playgroud)

归档时间：	7 年，10 月前
查看次数：	3556 次
最近记录：	7 年，10 月前

为什么 tf.matmul(a,b, transpose_b=True) 有效，但 tf.matmul(a, tf.transpose(b)) 无效？

关于tf.matmul

关于`tf.matmul`