如何提高在MNIST上训练的模型的数字识别能力？

Question

如何提高在MNIST上训练的模型的数字识别能力？

you*_*nda 11 java opencv machine-learning image-recognition mnist

我正在使用进行手印多位数识别Java，使用OpenCV库进行预处理和分割，并使用KerasMNIST训练的模型（精度为0.98）进行识别。

除了一件事之外，这种识别似乎效果很好。网络经常无法识别那些（数字“一”）。我不知道这是否是由于分割的预处理/不正确的实现而发生的，还是在标准MNIST上训练的网络只是没有看到看起来像我的测试用例的第一名。

这是经过预处理和分割后出现问题的数字的样子：

变成并分类为4。

变成并分类为7。

变成并分类为4。等等...

通过改进细分过程，可以解决此问题吗？还是通过增强培训设置？

编辑：增强训练集（数据扩充）肯定会有所帮助，这已经在我测试中，正确预处理的问题仍然存在。

我的预处理包括调整大小，转换为灰度，二值化，反转和膨胀。这是代码：

Mat resized = new Mat();
Imgproc.resize(image, resized, new Size(), 8, 8, Imgproc.INTER_CUBIC);

Mat grayscale = new Mat();
Imgproc.cvtColor(resized, grayscale, Imgproc.COLOR_BGR2GRAY);

Mat binImg = new Mat(grayscale.size(), CvType.CV_8U);
Imgproc.threshold(grayscale, binImg, 0, 255, Imgproc.THRESH_OTSU);

Mat inverted = new Mat();
Core.bitwise_not(binImg, inverted);

Mat dilated = new Mat(inverted.size(), CvType.CV_8U);
int dilation_size = 5;
Mat kernel = Imgproc.getStructuringElement(Imgproc.CV_SHAPE_CROSS, new Size(dilation_size, dilation_size));
Imgproc.dilate(inverted, dilated, kernel, new Point(-1,-1), 1);

Run Code Online (Sandbox Code Playgroud)

然后将预处理后的图像分割成各个数字，如下所示：

List<Mat> digits = new ArrayList<>();
List<MatOfPoint> contours = new ArrayList<>();
Imgproc.findContours(preprocessed.clone(), contours, new Mat(), Imgproc.RETR_EXTERNAL, Imgproc.CHAIN_APPROX_SIMPLE);

// code to sort contours
// code to check that contour is a valid char

List rects = new ArrayList<>();

for (MatOfPoint contour : contours) {
     Rect boundingBox = Imgproc.boundingRect(contour);
     Rect rectCrop = new Rect(boundingBox.x, boundingBox.y, boundingBox.width, boundingBox.height);

     rects.add(rectCrop);
}

for (int i = 0; i < rects.size(); i++) {
    Rect x = (Rect) rects.get(i);
    Mat digit = new Mat(preprocessed, x);

    int border = 50;
    Mat result = digit.clone();
    Core.copyMakeBorder(result, result, border, border, border, border, Core.BORDER_CONSTANT, new Scalar(0, 0, 0));

    Imgproc.resize(result, result, new Size(28, 28));
    digits.add(result);
}

Run Code Online (Sandbox Code Playgroud)

Answer 1

SiR*_*SiR 5

I believe that your problem is dilation process. I understand that you wish to normalize image sizes, but you shouldn't break the proportions, you should resize to maximum desired by one axis (the one that allows largest re-scale without letting another axis dimension to exceed the maximum size) and fill with background color the rest of the image. It's not that "standard MNIST just hasn't seen the number one which looks like your test cases", you make your images look like different trained numbers (the ones that are recognized)

如果您维护了图像的正确长宽比（源图像和后处理图像），则可以看到您不仅调整了图像的大小，而且使它“失真”了。可能是非均匀膨胀或尺寸调整不正确的结果

Answer 2

f4f*_*f4f 5

已经发布了一些答案，但它们都没有回答您关于图像预处理的实际问题。

轮到我，只要它是一个研究项目，我认为您的实施没有任何重大问题，做得很好。

但是要注意的一件事您可能会错过。数学形态学中有基本的运算：腐蚀和膨胀（由您使用）。还有复杂的操作：基本操作的各种组合（例如打开和关闭）。维基百科链接不是最好的简历参考，但您可以从它开始以获得想法。

通常最好使用开放而不是腐蚀和封闭而不是扩张，因为在这种情况下原始二值图像变化要小得多（但达到了清洁锐边或填充间隙的预期效果）。因此，在您的情况下，您应该检查关闭（图像膨胀，然后是具有相同内核的腐蚀）。如果即使使用 1*1 内核（1 像素大于图像的 16%）进行扩张，也会极大地修改超小图像 8*8，这在较大的图像上更少）。

要可视化这个想法，请参阅以下图片（来自 OpenCV 教程：1 , 2）：

扩张：

关闭：

希望能帮助到你。

Answer 3

you*_*nda 1

经过一些研究和实验，我得出的结论是图像预处理本身不是问题（我确实更改了一些建议参数，例如膨胀大小和形状，但它们对结果并不重要）。然而，有帮助的是以下两件事：

正如@f4f 注意到的，我需要用真实世界的数据收集我自己的数据集。这已经有很大帮助了。
我对分割预处理进行了重要更改。获得单独的轮廓后，我首先对图像进行尺寸标准化以适合20x20像素框（如图中所示MNIST）。之后，我28x28使用质心将框置于图像中间（对于二值图像来说，质心是两个维度的平均值）。

当然，仍然存在困难的分割情况，例如重叠或连接的数字，但上述更改回答了我最初的问题并提高了我的分类性能。

归档时间：	6 年，1 月前
查看次数：	284 次
最近记录：	6 年前