tf.math.reduce_max 是否允许像 torch.max 一样的梯度流？

Question

tf.math.reduce_max 是否允许像 torch.max 一样的梯度流？

Sil*_*tya 3 deep-learning tensorflow pytorch

我正在尝试在 Tensorflow 中构建多标签二元分类模型。该模型tf.math.reduce_max在两层之间有一个运算符（它不是最大池化，它用于不同的目的）。

班级数为3。

我正在使用二进制交叉熵损失并使用 Adam 优化器。

即使经过几个小时的训练，当我检查预测时，所有预测都在 0.49 到 0.51 的范围内。

该模型似乎没有学习任何东西，而是进行随机预测，这让我认为使用函数tf.math.reduce_max可能会导致问题。

然而，我在网上读到该torch.max函数允许通过它反向传播梯度。

当我检查 Tensorboard 中的图表时，我发现该图表在tf.math.reduce_max操作员处显示为未连接。那么，这个运算符是否允许梯度通过它反向传播？

编辑：添加代码

input_tensor = Input(shape=(256, 256, 3))
base_model_toc = VGG16(input_tensor=input_tensor,weights='imagenet',pooling=None, include_top=False)

x = base_model.output

x = GlobalAveragePooling2D()(x)

x = tf.math.reduce_max(x,axis=0,keepdims=True)

x = Dense(1024,activation='relu')(x)

output_1 = Dense(3, activation='sigmoid')(x)

model_a = Model(inputs=base_model_toc.input, outputs=output_1)

for layer in base_model.layers:
    layer.trainable = True

Run Code Online (Sandbox Code Playgroud)

之所以这样tf.math.reduce_max做是axis = 0因为这是这个模型中需要做的

我使用的优化器是 Adam，初始学习率为 0.00001

Answer 1

jde*_*esa 6

是的，tf.math.reduce_max确实允许梯度流动。很容易检查（这是 TensorFlow 2.x，但与 1.x 中的结果相同）：

import tensorflow as tf

with tf.GradientTape() as tape:
    x = tf.linspace(0., 2. * 3.1416, 10)
    tape.watch(x)
    # A sequence of operations involving reduce_max
    y = tf.math.square(tf.math.reduce_max(tf.math.sin(x)))
# Check gradients
g = tape.gradient(y, x)
print(g.numpy())
# [ 0.         0.         0.3420142 -0.        -0.        -0.
#  -0.         0.         0.         0.       ]

Run Code Online (Sandbox Code Playgroud)

正如您所看到的，存在y关于的有效梯度x。只有一个值不为零，因为它是导致最大值的值，因此它是x影响的值的唯一值y。这是操作的正确梯度。

归档时间：	5 年，2 月前
查看次数：	1599 次
最近记录：	5 年，2 月前