改善使用 mnist 数据集训练的神经网络的真实结果

Question

改善使用 mnist 数据集训练的神经网络的真实结果

Joh*_*nna 7 python machine-learning mnist handwriting-recognition keras

我已经使用 mnist 数据集用 keras 构建了一个神经网络，现在我正尝试将它用于实际手写数字的照片。当然，我并不期望结果是完美的，但我目前得到的结果还有很大的改进空间。

首先，我用一些用我最清晰的笔迹书写的单个数字的照片来测试它。它们是方形的，并且与 mnist 数据集中的图像具有相同的尺寸和颜色。它们保存在一个名为individual_test的文件夹中，例如：7(2)_digit.jpg。

网络通常非常确定错误的结果，我会给你一个例子：

我得到这张图片的结果如下：

result:  3 . probabilities:  [1.9963557196245318e-10, 7.241294497362105e-07, 0.02658148668706417, 0.9726449251174927, 2.5416460047722467e-08, 2.6078915027483163e-08, 0.00019745019380934536, 4.8302300825753264e-08, 0.0005754049634560943, 2.8358477788259506e-09]

Run Code Online (Sandbox Code Playgroud)

所以网络有 97% 的把握确定这是一个 3，而这张图片并不是唯一的情况。在 38 张图片中，只有 16 张被正确识别。令我震惊的是，网络对它的结果如此确定，尽管它与正确的结果相差无几。

编辑
在为prepare_image ( img = cv2.threshold(img, 0.1, 1, cv2.THRESH_BINARY_INV)[1])添加阈值后，性能略有提高。它现在得到了 38 张图片中的 19 张正确，但对于包括上面显示的图片在内的一些图像，它仍然很确定是错误的结果。这就是我现在得到的：

result:  3 . probabilities:  [1.0909866760000497e-11, 1.1584616004256532e-06, 0.27739930152893066, 0.7221096158027649, 1.900260038212309e-08, 6.555900711191498e-08, 4.479645940591581e-05, 6.455550760620099e-07, 0.0004443934594746679, 1.0013242457418414e-09]

Run Code Online (Sandbox Code Playgroud)

所以现在只有 72% 确定它的结果更好，但仍然......

我可以做些什么来提高性能？我可以更好地准备我的图像吗？还是应该将自己的图像添加到训练数据中？如果是这样，我将如何做这样的事情？

编辑

这是上面显示的图片在应用prepare_image之后的样子：

使用阈值后，这是同一张图片的样子：

对比：这是mnist数据集提供的其中一张图片：

他们看起来和我很相似。我该如何改进？
这是我的代码（包括阈值）：

# import keras and the MNIST dataset
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from keras.utils import np_utils
# numpy is necessary since keras uses numpy arrays
import numpy as np

# imports for pictures
import matplotlib.pyplot as plt
import PIL
import cv2

# imports for tests
import random
import os

class mnist_network():
    def __init__(self):
        """ load data, create and train model """
        # load data
        (X_train, y_train), (X_test, y_test) = mnist.load_data()
        # flatten 28*28 images to a 784 vector for each image
        num_pixels = X_train.shape[1] * X_train.shape[2]
        X_train = X_train.reshape((X_train.shape[0], num_pixels)).astype('float32')
        X_test = X_test.reshape((X_test.shape[0], num_pixels)).astype('float32')
        # normalize inputs from 0-255 to 0-1
        X_train = X_train / 255
        X_test = X_test / 255
        # one hot encode outputs
        y_train = np_utils.to_categorical(y_train)
        y_test = np_utils.to_categorical(y_test)
        num_classes = y_test.shape[1]


        # create model
        self.model = Sequential()
        self.model.add(Dense(num_pixels, input_dim=num_pixels, kernel_initializer='normal', activation='relu'))
        self.model.add(Dense(num_classes, kernel_initializer='normal', activation='softmax'))
        # Compile model
        self.model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

        # train the model
        self.model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=10, batch_size=200, verbose=2)

        self.train_img = X_train
        self.train_res = y_train
        self.test_img = X_test
        self.test_res = y_test


    def predict_result(self, img, show = False):
        """ predicts the number in a picture (vector) """
        assert type(img) == np.ndarray and img.shape == (784,)

        if show:
            img = img.reshape((28, 28))
            # show the picture
            plt.imshow(img, cmap='Greys')
            plt.show()
            img = img.reshape(img.shape[0] * img.shape[1])

        num_pixels = img.shape[0]
        # the actual number
        res_number = np.argmax(self.model.predict(img.reshape(-1,num_pixels)), axis = 1)
        # the probabilities
        res_probabilities = self.model.predict(img.reshape(-1,num_pixels))

        return (res_number[0], res_probabilities.tolist()[0])    # we only need the first element since they only have one


    def prepare_image(self, img, show = False):
        """ prepares the partial images used in partial_img_rec by transforming them
            into numpy arrays that the network will be able to process """
        # convert to greyscale
        img = img.convert("L")
        # rescale image to 28 *28 dimension
        img = img.resize((28,28), PIL.Image.ANTIALIAS)
        # inverse colors since the training images have a black background
        #img =  PIL.ImageOps.invert(img)
        # transform to vector
        img = np.asarray(img, "float32")
        img = img / 255.
        img[img < 0.5] = 0.

        img = cv2.threshold(img, 0.1, 1, cv2.THRESH_BINARY_INV)[1]

        if show:
            plt.imshow(img, cmap = "Greys")

        # flatten image to 28*28 = 784 vector
        num_pixels = img.shape[0] * img.shape[1]
        img = img.reshape(num_pixels)

        return img


    def partial_img_rec(self, image, upper_left, lower_right, results=[], show = False):
        """ partial is a part of an image """
        left_x, left_y = upper_left
        right_x, right_y = lower_right

        print("current test part: ", upper_left, lower_right)
        print("results: ", results)
        # condition to stop recursion: we've reached the full width of the picture
        width, height = image.size
        if right_x > width:
            return results

        partial = image.crop((left_x, left_y, right_x, right_y))
        if show:
            partial.show()
        partial = self.prepare_image(partial)

        step = height // 10

        # is there a number in this part of the image? 
        res, prop = self.predict_result(partial)
        print("result: ", res, ". probabilities: ", prop)
        # only count this result if the network is at least 50% sure
        if prop[res] >= 0.5:        
            results.append(res)
            # step is 80% of the partial image's size (which is equivalent to the original image's height) 
            step = int(height * 0.8)
            print("found valid result")
        else:
            # if there is no number found we take smaller steps
            step = height // 20 
        print("step: ", step)
        # recursive call with modified positions ( move on step variables )
        return self.partial_img_rec(image, (left_x + step, left_y), (right_x + step, right_y), results = results)

    def individual_digits(self, img):
        """ uses partial_img_rec to predict individual digits in square images """
        assert type(img) == PIL.JpegImagePlugin.JpegImageFile or type(img) == PIL.PngImagePlugin.PngImageFile or type(img) == PIL.Image.Image

        return self.partial_img_rec(img, (0,0), (img.size[0], img.size[1]), results=[])

    def test_individual_digits(self):
        """ test partial_img_rec with some individual digits (shape: square) 
            saved in the folder 'individual_test' following the pattern 'number_digit.jpg' """
        cnt_right, cnt_wrong = 0,0
        folder_content = os.listdir(".\individual_test")

        for imageName in folder_content:
            # image file must be a jpg or png
            assert imageName[-4:] == ".jpg" or imageName[-4:] == ".png"
            correct_res = int(imageName[0])
            image = PIL.Image.open(".\\individual_test\\" + imageName).convert("L")
            # only square images in this test
            if image.size[0]  != image.size[1]:
                print(imageName, " has the wrong proportions: ", image.size,". It has to be a square.")
                continue 
            predicted_res = self.individual_digits(image)

            if predicted_res == []:
                print("No prediction possible for ", imageName)
            else:
                predicted_res = predicted_res[0]

            if predicted_res != correct_res:
                print("error in partial_img-rec! Predicted ", predicted_res, ". The correct result would have been ", correct_res)
                cnt_wrong += 1
            else:
                cnt_right += 1
                print("correctly predicted ",imageName)
        print(cnt_right, " out of ", cnt_right + cnt_wrong," digits were correctly recognised. The success rate is therefore ", (cnt_right / (cnt_right + cnt_wrong)) * 100," %.")

    def multiple_digits(self, img):
        """ takes as input an image without unnecessary whitespace surrounding the digits """

        #assert type(img) == myImage
        width, height = img.size
        # start with the first square part of the image
        res_list = self.partial_img_rec(img, (0,0),(height ,height), results = [])
        res_str = ""
        for elem in res_list:
            res_str += str(elem)
        return res_str

    def test_multiple_digits(self):
        """ tests the function 'multiple_digits' using some images saved in the folder 'multi_test'.
            These images contain multiple handwritten digits without much whitespac surrounding them.
            The correct solutions are saved in the files' names followed by the characte '_'. """

        cnt_right, cnt_wrong = 0,0
        folder_content = os.listdir(".\multi_test")
        for imageName in folder_content:
            # image file must be a jpg or png
            assert imageName[-4:] == ".jpg" or imageName[-4:] == ".png"            
            image = PIL.Image.open(".\\multi_test\\" + imageName).convert("L")

            correct_res = imageName.split("_")[0]
            predicted_res = self.multiple_digits(image)
            if correct_res == predicted_res:
                cnt_right += 1
            else:
                cnt_wrong += 1
                print("Error in multiple_digits! The network predicted ", predicted_res, " but the correct result would have been ", correct_res)

        print("The network predicted correctly ", cnt_right, " out of ", cnt_right + cnt_wrong, " pictures. That's a success rate of ", cnt_right / (cnt_right + cnt_wrong) * 100, "%.")

network = mnist_network()
# this is the image shown above
result = network.individual_digits(PIL.Image.open(".\individual_test\\7(2)_digit.jpg"))

Run Code Online (Sandbox Code Playgroud)

Answer 1

Gee*_*ode 5

更新：

您有三个选项可以在此特定任务中获得更好的性能：

使用卷积网络，因为它在处理空间数据的任务中表现更好，比如图像，并且是更具生成性的分类器，就像这个。
使用或创建和/或生成更多您类型的图片，并用它们训练您的网络，您的网络也能够学习它们。
预处理您的图像以更好地与您之前训练网络的原始 MNIST 图像对齐。

我刚刚做了一个实验。我检查了每个代表一个数字的 MNIST 图像。我拍摄了您的图像并进行了一些我之前向您建议的预处理，例如：

1.做了一些阈值，但只是向下消除背景噪音，因为原始MNIST数据只有空白背景有一些最小阈值：

image[image < 0.1] = 0.

Run Code Online (Sandbox Code Playgroud)

2.令人惊讶的是，图像内部数字的大小已被证明是至关重要的，所以我缩放了 28 x 28 图像内部的数字，例如我们在数字周围有更多的填充。

3.我反转了图像，因为来自 keras 的 MNIST 数据也反转了。

image = ImageOps.invert(image)

Run Code Online (Sandbox Code Playgroud)

4.最后缩放数据，正如我们在训练中所做的那样：

image = image / 255.

Run Code Online (Sandbox Code Playgroud)

预处理后，我用 MNIST 数据集训练了模型，其中包含参数epochs=12, batch_size=200和结果：

结果：1，概率为：0.6844741106033325

 result:  **1** . probabilities:  [2.0584749904628552e-07, 0.9875971674919128, 5.821426839247579e-06, 4.979299319529673e-07, 0.012240586802363396, 1.1566483948399764e-07, 2.382085284580171e-08, 0.00013023221981711686, 9.620113416985987e-08, 2.5273093342548236e-05]

Run Code Online (Sandbox Code Playgroud)

结果：6概率：0.9221984148025513

result:  6 . probabilities:  [9.130864782491699e-05, 1.8290626258021803e-07, 0.00020504613348748535, 2.1564576968557958e-07, 0.0002401985548203811, 0.04510130733251572, 0.9221984148025513, 1.9014490248991933e-07, 0.03216308355331421, 3.323434683011328e-08]

Run Code Online (Sandbox Code Playgroud)

结果：7，概率：0.7105212807655334 注意：

result:  7 . probabilities:  [1.0372193770535887e-08, 7.988557626958936e-06, 0.00031014863634482026, 0.0056108818389475346, 2.434678014751057e-09, 3.2280522077599016e-07, 1.4190952857262573e-09, 0.9940618872642517, 1.612859932720312e-06, 7.102244126144797e-06]

Run Code Online (Sandbox Code Playgroud)

你的数字9有点棘手：

当我发现带有 MNIST 数据集的模型选择了关于9 的两个主要“特征” 。上部和下部。与您的图像一样，具有漂亮圆形的上部不是9，而是针对针对 MNIST 数据集训练的模型的大部分3。根据 MNIST 数据集，9的下半部分主要是拉直曲线。因此，基本上，由于 MNIST 样本，您的模型的完美形状9始终为3，除非您将使用足够数量的形状9样本再次训练模型。为了检查我的想法，我做了一个9秒的子实验：

我的9上半部分歪斜（根据 MNIST对9 来说大多可以）但底部略微卷曲（根据 MNIST对9 来说不合适）：

结果：9概率：0.5365301370620728

我的9有歪斜的上部（根据 MNIST大部分可以用于9）并且底部是直的（根据 MNIST可以用于9）：

结果：9概率：0.923724353313446

您的9具有被误解的形状属性：

结果：3概率：0.8158268928527832

result:  3 . probabilities:  [9.367801249027252e-05, 3.9978775021154433e-05, 0.0001467708352720365, 0.8158268928527832, 0.0005801069783046842, 0.04391581565141678, 6.44062723154093e-08, 7.099170943547506e-06, 0.09051419794559479, 0.048875387758016586]

Run Code Online (Sandbox Code Playgroud)

最后只是图像缩放（填充）重要性的证明，我在上面提到的至关重要：

结果：3概率：0.9845736622810364

结果：9概率：0.923724353313446

所以我们可以看到我们的模型选择了一些特征，它解释了，在图像内部尺寸过大且填充尺寸较小的形状的情况下，总是将其分类为3。

我认为我们可以使用 CNN 获得更好的性能，但是采样和预处理的方式对于在 ML 任务中获得最佳性能始终至关重要。

我希望它有帮助。

更新 2：

我发现了另一个问题，我也检查过并证明是正确的，图像内部数字的放置也很重要，这对于这种类型的神经网络是有意义的。一个很好的例子是在 MNIST 数据集中放置在中心的数字 7和9，如果我们将新的分类数字放置在图像的中心，靠近图像底部会导致更难或失败的分类。我检查了将7秒和9秒移到底部的理论，因此在图像顶部留下了更多位置，结果几乎是100% 准确。由于这是一个空间类型的问题，我想，对于CNN我们可以更有效地消除它。然而，如果 MNIST 与中心对齐会更好，或者我们可以通过编程来避免这个问题。

归档时间：	5 年，10 月前
查看次数：	980 次
最近记录：	5 年，10 月前