计算 GRU 层（Keras）的参数数量

Question

计算 GRU 层（Keras）的参数数量

Abi*_*cov 6 lstm tensorflow gated-recurrent-unit

为什么GRU层的参数个数是9600？

不应该是 ((16+32)*32 + 32) * 3 * 2 = 9,408 吗？

或者，重新排列，

32*(16 + 32 + 1)*3*2 = 9408

model = tf.keras.Sequential([
    tf.keras.layers.Embedding(input_dim=4500, output_dim=16, input_length=200),
    tf.keras.layers.Bidirectional(tf.keras.layers.GRU(32)),
    tf.keras.layers.Dense(6, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')
])
model.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])
model.summary()

Run Code Online (Sandbox Code Playgroud)

Answer 1

gis*_*ang 8

关键是当参数reset_after=Truein时，tensorflow 将分离输入和循环内核的偏差GRUCell。你可以看一些的源代码在GRUCell如下：

if self.use_bias:
    if not self.reset_after:
        bias_shape = (3 * self.units,)
    else:
        # separate biases for input and recurrent kernels
        # Note: the shape is intentionally different from CuDNNGRU biases
        # `(2 * 3 * self.units,)`, so that we can distinguish the classes
        # when loading and converting saved weights.
        bias_shape = (2, 3 * self.units)

Run Code Online (Sandbox Code Playgroud)

以复位门为例，我们一般看到如下公式。

但如果我们设置reset_after=True，实际公式如下：

如您所见，的默认参数GRU是reset_after=True in tensorflow2。但是默认参数GRU是reset_after=Falsein tensorflow1.x。

所以一个GRU层的参数个数应该((16+32)*32 + 32 + 32) * 3 * 2 = 9600在tensorflow2.

谢谢你！我为reset_after尝试了True和False，这正是你所说的。您知道添加两个单独的偏差项有什么意义吗？模型总是可以设置 b_combined = b_input + b_recurrent，那么有什么意义呢？（据我所知，唯一能产生影响的方法是，如果在模型的其他地方使用相同的偏差进行计算 (2认同)

归档时间：	6 年，7 月前
查看次数：	2703 次
最近记录：	6 年，3 月前