tf.keras 损失变为 NaN

Ron*_*kat 5 python machine-learning neural-network mnist tf.keras

我正在 tf.keras 中编写一个具有 3 层的神经网络。我的数据集是 MNIST 数据集。我减少了数据集中的示例数量,因此运行时间较低。这是我的代码:

import tensorflow as tf
from tensorflow.keras import layers
import numpy as np
import pandas as pd

!git clone https://github.com/DanorRon/data
%cd data
!ls

batch_size = 32
epochs = 10
alpha = 0.0001
lambda_ = 0
h1 = 50

train = pd.read_csv('/content/first-repository/mnist_train.csv.zip')
test = pd.read_csv('/content/first-repository/mnist_test.csv.zip')

train = train.loc['1':'5000', :]
test = test.loc['1':'2000', :]

train = train.sample(frac=1).reset_index(drop=True)
test = test.sample(frac=1).reset_index(drop=True)

x_train = train.loc[:, '1x1':'28x28']
y_train = train.loc[:, 'label']

x_test = test.loc[:, '1x1':'28x28']
y_test = test.loc[:, 'label']

x_train = x_train.values
y_train = y_train.values

x_test = x_test.values
y_test = y_test.values

nb_classes = 10
targets = y_train.reshape(-1)
y_train_onehot = np.eye(nb_classes)[targets]

nb_classes = 10
targets = y_test.reshape(-1)
y_test_onehot = np.eye(nb_classes)[targets]

model = tf.keras.Sequential()
model.add(layers.Dense(784, input_shape=(784,)))
model.add(layers.Dense(h1, activation='relu', kernel_regularizer=tf.keras.regularizers.l2(lambda_)))
model.add(layers.Dense(10, activation='sigmoid', kernel_regularizer=tf.keras.regularizers.l2(lambda_)))

model.compile(optimizer=tf.train.GradientDescentOptimizer(alpha), 
             loss = 'categorical_crossentropy',
             metrics = ['accuracy'])

model.fit(x_train, y_train_onehot, epochs=epochs, batch_size=batch_size)
Run Code Online (Sandbox Code Playgroud)

每当我运行它时,会发生以下三件事之一:

  1. 在几个 epoch 中,损失减少,准确度增加,直到损失无缘无故地变为 NaN,准确度直线下降。

  2. 每个时期的损失和准确性保持不变。通常损失为 2.3025,准确度为 0.0986。

  3. 损失从 NaN 开始(并保持这种状态),而准确度保持较低。

大多数时候,模型会做这些事情之一,但有时它会做一些随机的事情。似乎发生的这种不稳定行为是完全随机的。我不知道问题是什么。我该如何解决这个问题?

编辑:有时,损失会减少,但准确度保持不变。此外,有时损失减少而准确度增加,然后一段时间后准确度下降而损失仍然减少。或者,损失减少,准确度增加,然后切换,损失迅速增加,而准确度下降,最终以损失结束:2.3025 acc:0.0986。

编辑 2:这是有时会发生的事情的一个例子:

Epoch 1/100
49999/49999 [==============================] - 5s 92us/sample - loss: 1.8548 - acc: 0.2390

Epoch 2/100
49999/49999 [==============================] - 5s 104us/sample - loss: 0.6894 - acc: 0.8050

Epoch 3/100
49999/49999 [==============================] - 4s 90us/sample - loss: 0.4317 - acc: 0.8821

Epoch 4/100
49999/49999 [==============================] - 5s 104us/sample - loss: 2.2178 - acc: 0.1345

Epoch 5/100
49999/49999 [==============================] - 5s 90us/sample - loss: 2.3025 - acc: 0.0986

Epoch 6/100
49999/49999 [==============================] - 4s 90us/sample - loss: 2.3025 - acc: 0.0986

Epoch 7/100
49999/49999 [==============================] - 4s 89us/sample - loss: 2.3025 - acc: 0.0986
Run Code Online (Sandbox Code Playgroud)

编辑 3:我将损失更改为均方误差,现在网络运行良好。有没有办法让它保持交叉熵而不收敛到局部最小值?