在Keras中训练期间动态更改损失函数,而无需重新编译优化器之类的其他模型属性

Hor*_*rse 5 python callback epoch loss keras

是否可以model.loss在回调中进行设置而无需model.compile(...)在之后进行重新编译(因为自此优化器状态被重置),而只是重新编译model.loss,例如:

class NewCallback(Callback):

        def __init__(self):
            super(NewCallback,self).__init__()

        def on_epoch_end(self, epoch, logs={}):
            self.model.loss=[loss_wrapper(t_change, current_epoch=epoch)]
            self.model.compile_only_loss() # is there a version or hack of 
                                           # model.compile(...) like this?
Run Code Online (Sandbox Code Playgroud)

要使用关于stackoverflow的先前示例进行更多扩展:

要实现取决于历元数的损失函数,例如(如在这个stackoverflow问题中):

def loss_wrapper(t_change, current_epoch):
    def custom_loss(y_true, y_pred):
        c_epoch = K.get_value(current_epoch)
        if c_epoch < t_change:
            # compute loss_1
        else:
            # compute loss_2
    return custom_loss
Run Code Online (Sandbox Code Playgroud)

其中“ current_epoch”是使用回调更新的Keras变量:

current_epoch = K.variable(0.)
model.compile(optimizer=opt, loss=loss_wrapper(5, current_epoch), 
metrics=...)

class NewCallback(Callback):
    def __init__(self, current_epoch):
        self.current_epoch = current_epoch

    def on_epoch_end(self, epoch, logs={}):
        K.set_value(self.current_epoch, epoch)
Run Code Online (Sandbox Code Playgroud)

从本质上讲,人们可以将python代码转换为后端函数的组合,以使丢失工作如下:

def loss_wrapper(t_change, current_epoch):
    def custom_loss(y_true, y_pred):
        # compute loss_1 and loss_2
        bool_case_1=K.less(current_epoch,t_change)
        num_case_1=K.cast(bool_case_1,"float32")
        loss = (num_case_1)*loss_1 + (1-num_case_1)*loss_2
        return loss
    return custom_loss
it works.
Run Code Online (Sandbox Code Playgroud)

我对这些技巧不满意,并且想知道是否可以设置model.loss回调而不重新编译model.compile(...)之后(因为优化器状态被重置),然后重新编译model.loss

小智 1

我希望你现在已经找到了问题的解决方案,但是使用张量流我认为你可以通过构建自定义训练循环来解决这个问题(这里是文档)。这不会按照您的要求覆盖损失属性,但是您可能可以实现您正在寻找的目标。

例子

初始化变量

使用模型和数据集修改文档中的示例:

inputs = tf.keras.Input(shape=(784,), name="digits")
x1 = tf.keras.layers.Dense(64, activation="relu")(inputs)
x2 = tf.keras.layers.Dense(64, activation="relu")(x1)
outputs = tf.keras.layers.Dense(10, name="predictions")(x2)
model = tf.keras.Model(inputs=inputs, outputs=outputs)


# Prepare the training dataset.
batch_size = 64
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train = np.reshape(x_train, (-1, 784))
x_test = np.reshape(x_test, (-1, 784))
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(batch_size)
Run Code Online (Sandbox Code Playgroud)

我们可以定义两个损失函数(我选择的两个从科学角度来看没有意义,但允许我们检查代码的工作原理)

# Instantiate an optimizer.
optimizer = tf.keras.optimizers.Adam(learning_rate=1e-3)
# Instantiate a loss function.
loss_1 = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
loss_2 = lambda y_true, y_pred: -1 * loss_1(y_true, y_pred)
Run Code Online (Sandbox Code Playgroud)

训练循环

然后我们可以执行我们的自定义训练循环:

epochs = 10
for epoch in range(epochs):
    print("\nStart of epoch %d" % (epoch,))

    # Iterate over the batches of the dataset.
    for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):

        # Open a GradientTape to record the operations run
        # during the forward pass, which enables auto-differentiation.
        loss_fn = loss_1 if epoch % 2 else loss_2
        with tf.GradientTape() as tape:

            # Run the forward pass of the layer.
            # The operations that the layer applies
            # to its inputs are going to be recorded
            # on the GradientTape.
            logits = model(x_batch_train, training=True)  # Logits for this minibatch

            # Compute the loss value for this minibatch.
            loss_value = loss_fn(y_batch_train, logits)

        # Use the gradient tape to automatically retrieve
        # the gradients of the trainable variables with respect to the loss.
        grads = tape.gradient(loss_value, model.trainable_weights)

        # Run one step of gradient descent by updating
        # the value of the variables to minimize the loss.
        optimizer.apply_gradients(zip(grads, model.trainable_weights))
         # Log every 200 batches.
        if step % 200 == 0:
            print(
                "Training loss (for one batch) at step %d: %.4f"
                % (step, float(loss_value))
            )
            print("Seen so far: %s samples" % ((step + 1) * 64))
Run Code Online (Sandbox Code Playgroud)

我们检查输出是否是我们想要的(交替正负损失)

Start of epoch 0
Training loss (for one batch) at step 0: -96.1003
Seen so far: 64 samples
Training loss (for one batch) at step 200: -3383849.5000
Seen so far: 12864 samples
Training loss (for one batch) at step 400: -40419124.0000
Seen so far: 25664 samples
Training loss (for one batch) at step 600: -149133008.0000
Seen so far: 38464 samples
Training loss (for one batch) at step 800: -328322816.0000
Seen so far: 51264 samples

Start of epoch 1
Training loss (for one batch) at step 0: 580457984.0000
Seen so far: 64 samples
Training loss (for one batch) at step 200: 297710528.0000
Seen so far: 12864 samples
Training loss (for one batch) at step 400: 213328544.0000
Seen so far: 25664 samples
Training loss (for one batch) at step 600: 159328976.0000
Seen so far: 38464 samples
Training loss (for one batch) at step 800: 105737024.0000
Seen so far: 51264 samples
Run Code Online (Sandbox Code Playgroud)

缺点及进一步改进

编写自定义循环的问题是您将失去 kerasfit方法的便利性。我认为您可以通过定义自定义模型并覆盖文档中所示train_step的内容来管理此问题

如果您确实需要loss更改模型的属性,则可以compiled_loss使用 a keras.engine.compile_utils.LossesContainer此处是参考)设置属性并设置model.train_functionmodel.make_train_function()(以便考虑新的损失)。