使用 OpenCV 清理文本图像以进行 OCR 阅读

Question

使用 OpenCV 清理文本图像以进行 OCR 阅读

Ste*_*imo 5 python ocr opencv tesseract

我收到了一些需要处理的图像，以便对其中的一些信息进行 OCR。以下是原文：

原 1

原 1

原2

原3

原4

使用此代码处理它们后：

img = cv2.imread('original_1.jpg', 0) 
ret,thresh = cv2.threshold(img,55,255,cv2.THRESH_BINARY)
opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, cv2.getStructuringElement(cv2.MORPH_RECT,(2,2)))
cv2.imwrite('result_1.jpg', opening)

Run Code Online (Sandbox Code Playgroud)

我得到这些结果：

结果 1

结果 1

结果 2

结果 2

结果 3

结果 3

结果 4

结果 4

正如您所看到的，一些图像在 OCR 读取方面获得了不错的结果，而另一些图像在背景中仍然保留了一些噪音。

关于如何清理背景的任何建议？

Answer 1

sta*_*ine 3

MH304的回答非常好而且直截了当。如果您无法使用形态或模糊来获得更清晰的图像，请考虑使用“区域过滤器”。也就是说，过滤每个不具有最小面积的斑点。

\n\n

使用opencv的connectedComponentsWithStats，这里是一个非常基本的区域过滤器的C++实现：

\n\n

cv::Mat outputLabels, stats, img_color, centroids;\n\nint numberofComponents = cv::connectedComponentsWithStats(bwImage, outputLabels, \nstats, centroids, connectivity);\n\nstd::vector<cv::Vec3b> colors(numberofComponents+1);\ncolors[i] = cv::Vec3b(rand()%256, rand()%256, rand()%256);\n\n//do not count the original background-> label = 0:\ncolors[0] = cv::Vec3b(0,0,0);\n\n//Area threshold:\nint minArea = 10; //10 px\n\nfor( int i = 1; i <= numberofComponents; i++ ) {\n\n    //get the area of the current blob:\n    auto blobArea = stats.at<int>(i-1, cv::CC_STAT_AREA);\n\n    //apply the area filter:\n    if ( blobArea < minArea )\n    {\n        //filter blob below minimum area:\n        //small regions are painted with (ridiculous) pink color\n        colors[i-1] = cv::Vec3b(248,48,213);\n\n    }\n\n}\n

Run Code Online (Sandbox Code Playgroud)\n\n

使用区域过滤器，我在最嘈杂的图像上得到了这个结果：

\n\n

**附加信息：

\n\n

基本上，算法是这样的：

\n\n

将二进制图像传递给connectedComponentsWithStats。该函数将计算连接分量的数量、标签矩阵和附加矩阵以及统计数据\xe2\x80\x93（包括斑点区域）。
准备一个大小为 \xe2\x80\x9c numberOfcomponents \xe2\x80\x9d 的颜色向量，这将有助于可视化我们实际过滤的斑点。颜色由rand函数随机生成。从范围 0 \xe2\x80\x93 255 开始，每个像素有 3 个值：BGR。
考虑到背景是黑色的，所以忽略这个 \xe2\x80\x9cconnected component\xe2\x80\x9d 及其颜色（黑色）。
设置区域阈值。该区域下方的所有斑点或像素都将被涂上（可笑的）粉红色。
循环遍历所有找到的连接组件（斑点），通过统计矩阵检索当前斑点的面积并将其与面积阈值进行比较。
如果该区域低于阈值，请将斑点涂成粉红色（在这种情况下，但通常您需要黑色）。

\n

归档时间：	5 年，9 月前
查看次数：	2541 次
最近记录：	5 年，1 月前