使用 tf.function 时获取梯度

Question

使用 tf.function 时获取梯度

mar*_*lon 7 python decorator tensorflow2.0 gradienttape

我对以下示例中观察到的行为感到困惑：

import tensorflow as tf

@tf.function
def f(a):
    c = a * 2
    b = tf.reduce_sum(c ** 2 + 2 * c)
    return b, c

def fplain(a):
    c = a * 2
    b = tf.reduce_sum(c ** 2 + 2 * c)
    return b, c


a = tf.Variable([[0., 1.], [1., 0.]])

with tf.GradientTape() as tape:
    b, c = f(a)
    
print('tf.function gradient: ', tape.gradient([b], [c]))

# outputs: tf.function gradient:  [None]

with tf.GradientTape() as tape:
    b, c = fplain(a)
    
print('plain gradient: ', tape.gradient([b], [c]))

# outputs: plain gradient:  [<tf.Tensor: shape=(2, 2), dtype=float32, numpy=
# array([[2., 6.],
#        [6., 2.]], dtype=float32)>]

Run Code Online (Sandbox Code Playgroud)

较低的行为是我所期望的。我如何理解@tf.function案例？

预先非常感谢您！

（请注意，此问题不同于：使用 tf.function 时缺少梯度，因为这里所有计算都在函数内部。）

Answer 1

use*_*327 9

梯度磁带不会记录将@tf.function函数作为一个整体生成的 tf.Graph 内部的操作。粗略地说，f应用于，并且梯度磁带记录了的输出相对于输入a的梯度（它是唯一观察的变量）。fatape.watched_variables()

在第二种情况下，没有生成图，并且以 Eager 模式应用操作。所以一切都按预期进行。

一个好的做法是将计算成本最高的函数包装在@tf.function（通常是训练循环）中。在你的情况下，它将类似于：

@tf.function
def f(a):
    with tf.GradientTape() as tape:
        c = a * 2
        b = tf.reduce_sum(c ** 2 + 2 * c)
    grads = tape.gradient([b], [c])
    print('tf.function gradient: ', grads)
    return grads

Run Code Online (Sandbox Code Playgroud)

归档时间：	4 年，3 月前
查看次数：	1077 次
最近记录：	2 年，6 月前