如何在tensorflow 2.0中累积梯度？

Question

如何在tensorflow 2.0中累积梯度？

Nag*_*S N 8 python tensorflow tensorflow2.0

我正在用训练模型tensorflow 2.0。我的训练集中的图像具有不同的分辨率。我构建的模型可以处理可变分辨率（转换层，然后是全局平均）。我的训练集非常小，我想在一个批次中使用完整的训练集。

由于我的图像具有不同的分辨率，因此我无法使用model.fit(). 因此，我计划将每个样本单独通过网络，累积错误/梯度，然后应用一个优化器步骤。我能够计算损失值，但我不知道如何累积损失/梯度。如何累积损失/梯度，然后应用单个优化器步骤？

代码：

for i in range(num_epochs):
    print(f'Epoch: {i + 1}')
    total_loss = 0
    for j in tqdm(range(num_samples)):
        sample = samples[j]
        with tf.GradientTape as tape:
            prediction = self.model(sample)
            loss_value = self.loss_function(y_true=labels[j], y_pred=prediction)
        gradients = tape.gradient(loss_value, self.model.trainable_variables)
        self.optimizer.apply_gradients(zip(gradients, self.model.trainable_variables))
        total_loss += loss_value

    epoch_loss = total_loss / num_samples
    print(f'Epoch loss: {epoch_loss}')

Run Code Online (Sandbox Code Playgroud)

Answer 1

Ram*_*.C. 9

如果我对这句话的理解正确的话：

如何累积损失/梯度，然后应用单个优化器步骤？

@Nagabhushan 正在尝试累积梯度，然后对（平均）累积梯度应用优化。@TensorflowSupport 提供的答案没有回答这个问题。为了仅执行一次优化，并累积多个磁带的梯度，您可以执行以下操作：

for i in range(num_epochs):
    print(f'Epoch: {i + 1}')
    total_loss = 0

    # get trainable variables
    train_vars = self.model.trainable_variables
    # Create empty gradient list (not a tf.Variable list)
    accum_gradient = [tf.zeros_like(this_var) for this_var in train_vars]

    for j in tqdm(range(num_samples)):
        sample = samples[j]
        with tf.GradientTape as tape:
            prediction = self.model(sample)
            loss_value = self.loss_function(y_true=labels[j], y_pred=prediction)
        total_loss += loss_value

        # get gradients of this tape
        gradients = tape.gradient(loss_value, train_vars)
        # Accumulate the gradients
        accum_gradient = [(acum_grad+grad) for acum_grad, grad in zip(accum_gradient, gradients)]


    # Now, after executing all the tapes you needed, we apply the optimization step
    # (but first we take the average of the gradients)
    accum_gradient = [this_grad/num_samples for this_grad in accum_gradient]
    # apply optimization step
    self.optimizer.apply_gradients(zip(accum_gradient,train_vars))
        

    epoch_loss = total_loss / num_samples
    print(f'Epoch loss: {epoch_loss}')

Run Code Online (Sandbox Code Playgroud)

应避免在训练循环内使用 tf.Variable()，因为在尝试将代码作为图形执行时会产生错误。如果您在训练函数中使用 tf.Variable() ，然后用“@tf.function”装饰它或应用“tf.function(my_train_fcn)”来获取图形函数（即为了提高性能），执行将上升错误。发生这种情况是因为 tf.Variable 函数的跟踪导致的行为与急切执行中观察到的行为不同（分别是重新利用或创建）。您可以在张量流帮助页面中找到更多相关信息。

你好，A_Murphy，内存使用量不应该增加，我已经使用这种方法在大型模型中进行了许多步骤的训练，并且没有发现任何内存泄漏。然而，我从未使用它在 eager 模式下进行训练，重新创建变量而不是删除变量可能会出现一些问题。Eager 模式允许在循环内创建变量，在图形模式下这不会发生。 (2认同)

归档时间：	5 年，11 月前
查看次数：	6878 次
最近记录：	5 年，6 月前