tf keras SparseCategoricalCrossentropy 和 sparse_categorical_accuracy 在训练期间报告错误值

kaw*_*vin 5 keras tensorflow cross-entropy

这是 tf 2.3.0。在训练期间, SparseCategoricalCrossentropy loss 和 sparse_categorical_accuracy 的报告值似乎还差得很远。我查看了我的代码,但还没有发现任何错误。这是要重现的代码:

import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

x = np.random.randint(0, 255, size=(64, 224, 224, 3)).astype('float32')
y = np.random.randint(0, 3, (64, 1)).astype('int32')

ds = tf.data.Dataset.from_tensor_slices((x, y)).batch(32)

def create_model():
  input_layer = tf.keras.layers.Input(shape=(224, 224, 3), name='img_input')
  x = tf.keras.layers.experimental.preprocessing.Rescaling(1./255, name='rescale_1_over_255')(input_layer)

  base_model = tf.keras.applications.ResNet50(input_tensor=x, weights='imagenet', include_top=False)

  x = tf.keras.layers.GlobalAveragePooling2D(name='global_avg_pool_2d')(base_model.output)

  output = Dense(3, activation='softmax', name='predictions')(x)

  return tf.keras.models.Model(inputs=input_layer, outputs=output)

model = create_model()

model.compile(
  optimizer=tf.keras.optimizers.Adam(learning_rate=1e-4),
  loss=tf.keras.losses.SparseCategoricalCrossentropy(), 
  metrics=['sparse_categorical_accuracy']
)

model.fit(ds, steps_per_epoch=2, epochs=5)
Run Code Online (Sandbox Code Playgroud)

这是打印出来的:

Epoch 1/5
2/2 [==============================] - 0s 91ms/step - loss: 1.5160 - sparse_categorical_accuracy: 0.2969
Epoch 2/5
2/2 [==============================] - 0s 85ms/step - loss: 0.0892 - sparse_categorical_accuracy: 1.0000
Epoch 3/5
2/2 [==============================] - 0s 84ms/step - loss: 0.0230 - sparse_categorical_accuracy: 1.0000
Epoch 4/5
2/2 [==============================] - 0s 82ms/step - loss: 0.0109 - sparse_categorical_accuracy: 1.0000
Epoch 5/5
2/2 [==============================] - 0s 82ms/step - loss: 0.0065 - sparse_categorical_accuracy: 1.0000
Run Code Online (Sandbox Code Playgroud)

但是,如果我仔细检查model.evaluate,并“手动”检查准确性:

model.evaluate(ds)

2/2 [==============================] - 0s 25ms/step - loss: 1.2681 - sparse_categorical_accuracy: 0.2188
[1.268101453781128, 0.21875]

y_pred = model.predict(ds)
y_pred = np.argmax(y_pred, axis=-1)
y_pred = y_pred.reshape(-1, 1)
np.sum(y == y_pred)/len(y)

0.21875
Run Code Online (Sandbox Code Playgroud)

model.evaluate(...) 的结果与“手动”检查的指标一致。但是如果你盯着训练的损失/指标,它们看起来很遥远。很难看出哪里出了问题,因为从来没有抛出错误或异常。

此外,我创建了一个非常简单的案例来尝试重现这一点,但实际上无法在此处重现。请注意,batch_size == 数据长度,因此这不是小批量 GD,而是全批量 GD(以消除与小批量损失/指标的混淆:

x = np.random.randn(1024, 1).astype('float32')
y = np.random.randint(0, 3, (1024, 1)).astype('int32')
ds = tf.data.Dataset.from_tensor_slices((x, y)).batch(1024)
model = Sequential()
model.add(Dense(3, activation='softmax'))
model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=1e-4),
    loss=tf.keras.losses.SparseCategoricalCrossentropy(), 
    metrics=['sparse_categorical_accuracy']
)
model.fit(ds, epochs=5)
model.evaluate(ds)
Run Code Online (Sandbox Code Playgroud)

正如我在评论中提到的,一个嫌疑人是批处理规范层,对于无法重现的情况,我没有。

SvG*_*vGA 0

您会得到不同的结果,因为 fit() 将训练损失显示为当前时期每批训练数据的损失平均值。这可能会降低历元平均水平。计算出的损失进一步用于更新模型。然而,evaluate() 是使用模型在训练结束时计算的,从而导致不同的损失。您可以查看官方的Keras FAQ和相关的StackOverflow 帖子

另外,尝试提高学习率。