CNN训练准确度在训练期间变得更好,但测试精度保持在40%左右

woh*_*he1 3 python numpy conv-neural-network keras tensorflow

所以在过去的几个月里我一直在学习Tensorflow和Keras的神经网络,所以我想尝试为CIFAR10数据集建立一个模型(下面的代码).

然而,在训练过程中,准确性变得更好(从1个时期后的约35%到5个时期后的约60-65%),但是val_acc保持不变或仅增加一点.以下是打印结果:

Epoch 1/5
50000/50000 [==============================] - 454s 9ms/step - loss: 1.7761 - acc: 0.3584 - val_loss: 8.6776 - val_acc: 0.4489
Epoch 2/5
50000/50000 [==============================] - 452s 9ms/step - loss: 1.3670 - acc: 0.5131 - val_loss: 8.9749 - val_acc: 0.4365
Epoch 3/5
50000/50000 [==============================] - 451s 9ms/step - loss: 1.2089 - acc: 0.5721 - val_loss: 7.7254 - val_acc: 0.5118
Epoch 4/5
50000/50000 [==============================] - 452s 9ms/step - loss: 1.1140 - acc: 0.6080 - val_loss: 7.9587 - val_acc: 0.4997
Epoch 5/5
50000/50000 [==============================] - 452s 9ms/step - loss: 1.0306 - acc: 0.6385 - val_loss: 7.4351 - val_acc: 0.5321
10000/10000 [==============================] - 27s 3ms/step
loss:  7.435152648162842 
accuracy:  0.5321
Run Code Online (Sandbox Code Playgroud)

我在互联网上环顾四周,我最好的猜测是我的模型过度装配,所以我尝试删除一些图层,添加更多的辍学图层并减少过滤器的数量,但没有一个显示任何增强.

最奇怪的是,不久之前,我根据一些教程制作了一个非常相似的模型,在8个时期之后最终准确率为80%.(虽然我丢失了那个文件)

这是我的模型的代码:

model = Sequential()
model.add(Conv2D(filters=256,
                 kernel_size=(3, 3),
                 activation='relu',
                 data_format='channels_last',
                 input_shape=(32, 32, 3)))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(filters=128,
                 kernel_size=(2, 2),
                 activation='relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
model.add(Dense(1024, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))


model.compile(optimizer=adam(),
              loss=categorical_crossentropy,
              metrics=['accuracy'])

model.fit(train_images, train_labels,
          batch_size=1000,
          epochs=5,
          verbose=1,
          validation_data=(test_images, test_labels))

loss, accuracy = model.evaluate(test_images, test_labels)
print('loss: ', loss, '\naccuracy: ', accuracy)
Run Code Online (Sandbox Code Playgroud)

train_images并且test_imagesnumpy arrays规模(50000,32,32,3)(10000,32,32,3)train_labelstest_labelsnumpy arrays大小(50000,10)(10000,10).

我的问题:是什么导致了这一点,我该怎么办呢?

在Maxim的回答后编辑:

我将模型改为:

model = Sequential()
model.add(Conv2D(filters=64,
                 kernel_size=(3, 3),
                 activation='relu',
                 kernel_initializer='he_normal',    # better for relu based networks
                 input_shape=(32, 32, 3)))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(filters=256,
                 kernel_size=(3, 3),
                 activation='relu',
                 kernel_initializer='he_normal'))
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.3))
model.add(Dense(10, activation='softmax'))
Run Code Online (Sandbox Code Playgroud)

现在输出如下:

Epoch 1/10
50000/50000 [==============================] - 326s 7ms/step - loss: 1.4916 - acc: 0.4809 - val_loss: 7.7175 - val_acc: 0.5134
Epoch 2/10
50000/50000 [==============================] - 338s 7ms/step - loss: 1.0622 - acc: 0.6265 - val_loss: 6.9945 - val_acc: 0.5588
Epoch 3/10
50000/50000 [==============================] - 326s 7ms/step - loss: 0.8957 - acc: 0.6892 - val_loss: 6.6270 - val_acc: 0.5833
Epoch 4/10
50000/50000 [==============================] - 324s 6ms/step - loss: 0.7813 - acc: 0.7271 - val_loss: 5.5790 - val_acc: 0.6474
Epoch 5/10
50000/50000 [==============================] - 327s 7ms/step - loss: 0.6690 - acc: 0.7668 - val_loss: 5.7479 - val_acc: 0.6358
Epoch 6/10
50000/50000 [==============================] - 320s 6ms/step - loss: 0.5671 - acc: 0.8031 - val_loss: 5.8720 - val_acc: 0.6302
Epoch 7/10
50000/50000 [==============================] - 328s 7ms/step - loss: 0.4865 - acc: 0.8319 - val_loss: 5.6320 - val_acc: 0.6451
Epoch 8/10
50000/50000 [==============================] - 320s 6ms/step - loss: 0.3995 - acc: 0.8611 - val_loss: 5.3879 - val_acc: 0.6615
Epoch 9/10
50000/50000 [==============================] - 320s 6ms/step - loss: 0.3337 - acc: 0.8837 - val_loss: 5.6874 - val_acc: 0.6432
Epoch 10/10
50000/50000 [==============================] - 320s 6ms/step - loss: 0.2806 - acc: 0.9033 - val_loss: 5.7424 - val_acc: 0.6399
10000/10000 [==============================] - 19s 2ms/step
loss:  5.74234927444458 
accuracy:  0.6399
Run Code Online (Sandbox Code Playgroud)

似乎我再次过度拟合,即使我在迄今为止得到的帮助下改变了模型......任何解释或提示?

输入图像是(32,32,3)标准化为的numpy数组(0,1)

Max*_*xim 5

您还没有包括如何准备数据,这是一个使这个网络学得更好的补充:

x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
Run Code Online (Sandbox Code Playgroud)

如果您进行这样的数据规范化,那么您的网络就可以了:它在5个时期后达到了约65-70%的测试精度,这是一个很好的结果.请注意,5个时代只是一个开始,它需要大约30-50个时期来真正地学习数据并显示接近现有技术的结果.

以下是我注意到的一些小改进,可以为您提供额外的性能点:

  • 由于您使用基于网络RELU,he_normal初始化是更好的glorot_uniform(这是Conv2D默认).
  • 随着您在网络中的深入,减少过滤器的数量是很奇怪的.你应该做对了.我改变256 -> 64128 -> 256和准确性提高.
  • 我略微减少了辍学率0.5 -> 0.4.
  • 内核大小3x3比常见2x2.我想你也应该尝试第二个转换层.实际上,您可以使用所有超参数来找到最佳组合.

这是最终的代码:

(x_train, y_train), (x_test, y_test) = cifar10.load_data()
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)

model = Sequential()
model.add(Conv2D(filters=64,
                 kernel_size=(3, 3),
                 activation='relu',
                 kernel_initializer='he_normal',
                 input_shape=(32, 32, 3)))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(filters=256,
                 kernel_size=(2, 2),
                 kernel_initializer='he_normal',
                 activation='relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
model.add(Dense(1024, activation='relu'))
model.add(Dropout(0.4))
model.add(Dense(10, activation='softmax'))

model.compile(optimizer=adam(),
              loss=categorical_crossentropy,
              metrics=['accuracy'])

x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255

model.fit(x_train, y_train,
          batch_size=500,
          epochs=5,
          verbose=1,
          validation_data=(x_test, y_test))

loss, accuracy = model.evaluate(x_test, y_test)
print('loss: ', loss, '\naccuracy: ', accuracy)
Run Code Online (Sandbox Code Playgroud)

5个时期后的结果:

loss:  0.822134458447 
accuracy:  0.7126
Run Code Online (Sandbox Code Playgroud)

顺便说一句,您可能有兴趣将您的方法与keras 示例CIFAR-10 conv net进行比较.