字符/数字的边界框检测

spa*_*del 6 python ocr opencv

我有图像,如下所示:

在此输入图像描述

我想找到 8 位数字的边界框。我的第一次尝试是使用 cv2 和以下代码:

import cv2
import matplotlib.pyplot as plt
import cvlib as cv
from cvlib.object_detection import draw_bbox

im = cv2.imread('31197402.png')
bbox, label, conf = cv.detect_common_objects(im)
output_image = draw_bbox(im, bbox, label, conf)
plt.imshow(output_image)
plt.show()
Run Code Online (Sandbox Code Playgroud)

不幸的是,这不起作用。有人有想法吗?

sta*_*ine 11

您的解决方案中的问题可能是输入图像,其质量非常差。人物和背景之间几乎没有任何对比。斑点检测算法cvlib可能无法区分字符斑点和背景,从而产生无用的二进制掩码。让\xe2\x80\x99s 尝试纯粹使用OpenCV.

\n

我建议采取以下步骤:

\n
    \n
  1. 应用自适应阈值以获得相当好的二进制掩码。
  2. \n
  3. 使用区域过滤器清除二进制掩模中的斑点噪声。
  4. \n
  5. 使用形态学提高二值图像的质量。
  6. \n
  7. 获取每个字符的外部轮廓,并为每个字符块拟合一个边界矩形
  8. \n
  9. 使用先前计算的边界矩形裁剪每个字符。
  10. \n
\n

让\xe2\x80\x99s看看代码:

\n
# importing cv2 & numpy:\nimport numpy as np\nimport cv2\n\n# Set image path\npath = "C:/opencvImages/"\nfileName = "mrrm9.png"\n\n# Read input image:\ninputImage = cv2.imread(path+fileName)\ninputCopy = inputImage.copy()\n\n# Convert BGR to grayscale:\ngrayscaleImage = cv2.cvtColor(inputImage, cv2.COLOR_BGR2GRAY)\n
Run Code Online (Sandbox Code Playgroud)\n

从这里开始,\xe2\x80\x99s 没有太多可讨论的,只需读取BGR图像并将其转换为grayscale. 现在,让\xe2\x80\x99s 应用一个adaptive thresholdusinggaussian方法。这是棘手的部分,因为参数是根据输入的质量手动调整的。该方法的工作方式是将图像划分为 的单元格网格windowSize,然后应用局部阈值来找到前景和背景之间的最佳分离。一个附加常数,表示为windowConstant添加到阈值以微调输出:

\n
# Set the adaptive thresholding (gasussian) parameters:\nwindowSize = 31\nwindowConstant = -1\n# Apply the threshold:\nbinaryImage = cv2.adaptiveThreshold(grayscaleImage, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, windowSize, windowConstant)\n
Run Code Online (Sandbox Code Playgroud)\n

你会得到这个漂亮的二值图像:

\n\n

现在,如您所见,图像有一些斑点噪声。让\xe2\x80\x99s 应用 anarea filter来消除噪声。噪声小于感兴趣的目标斑点,因此我们可以根据面积轻松过滤它们,如下所示:

\n
# Perform an area filter on the binary blobs:\ncomponentsNumber, labeledImage, componentStats, componentCentroids = \\\ncv2.connectedComponentsWithStats(binaryImage, connectivity=4)\n\n# Set the minimum pixels for the area filter:\nminArea = 20\n\n# Get the indices/labels of the remaining components based on the area stat\n# (skip the background component at index 0)\nremainingComponentLabels = [i for i in range(1, componentsNumber) if componentStats[i][4] >= minArea]\n\n# Filter the labeled pixels based on the remaining labels,\n# assign pixel intensity to 255 (uint8) for the remaining pixels\nfilteredImage = np.where(np.isin(labeledImage, remainingComponentLabels) == True, 255, 0).astype(\'uint8\')\n
Run Code Online (Sandbox Code Playgroud)\n

这是过滤后的图像:

\n\n

我们可以通过一些形态学来提高该图像的质量。有些字符似乎被破坏了(看看第一个3- 它被破坏成两个独立的斑点)。我们可以加入他们应用关闭操作:

\n
# Set kernel (structuring element) size:\nkernelSize = 3\n\n# Set operation iterations:\nopIterations = 1\n\n# Get the structuring element:\nmaxKernel = cv2.getStructuringElement(cv2.MORPH_RECT, (kernelSize, kernelSize))\n\n# Perform closing:\nclosingImage = cv2.morphologyEx(filteredImage, cv2.MORPH_CLOSE, maxKernel, None, None, opIterations, cv2.BORDER_REFLECT101)\n
Run Code Online (Sandbox Code Playgroud)\n

这是“关闭”图像:

\n\n

现在,您想要获取bounding boxes每个角色的 。让\xe2\x80\x99s 检测每个斑点的外部轮廓并在其周围拟合一个漂亮的矩形:

\n
# Get each bounding box\n# Find the big contours/blobs on the filtered image:\ncontours, hierarchy = cv2.findContours(closingImage, cv2.RETR_CCOMP, cv2.CHAIN_APPROX_SIMPLE)\n\ncontours_poly = [None] * len(contours)\n# The Bounding Rectangles will be stored here:\nboundRect = []\n\n# Alright, just look for the outer bounding boxes:\nfor i, c in enumerate(contours):\n\n    if hierarchy[0][i][3] == -1:\n        contours_poly[i] = cv2.approxPolyDP(c, 3, True)\n        boundRect.append(cv2.boundingRect(contours_poly[i]))\n\n\n# Draw the bounding boxes on the (copied) input image:\nfor i in range(len(boundRect)):\n    color = (0, 255, 0)\n    cv2.rectangle(inputCopy, (int(boundRect[i][0]), int(boundRect[i][1])), \\\n              (int(boundRect[i][0] + boundRect[i][2]), int(boundRect[i][1] + boundRect[i][3])), color, 2)\n
Run Code Online (Sandbox Code Playgroud)\n

最后一个for循环几乎是可选的。它从列表中获取每个边界矩形并将其绘制在输入图像上,以便您可以看到每个单独的矩形,如下所示:

\n\n

让我们在二值图像上可视化它:

\n\n

此外,如果您想使用我们刚刚获得的边界框裁剪每个字符,您可以这样做:

\n
# Crop the characters:\nfor i in range(len(boundRect)):\n    # Get the roi for each bounding rectangle:\n    x, y, w, h = boundRect[i]\n\n    # Crop the roi:\n    croppedImg = closingImage[y:y + h, x:x + w]\n    cv2.imshow("Cropped Character: "+str(i), croppedImg)\n    cv2.waitKey(0)\n
Run Code Online (Sandbox Code Playgroud)\n

这就是获取各个边界框的方法。现在,也许您正在尝试将这些图像传递到OCR. 我尝试将过滤后的二进制图像(在关闭操作之后)传递给pyocr(That\xe2\x80\x99s OCR I\xe2\x80\x99m 使用),并将其作为输出字符串:31197402

\n

OCR我用来获取关闭图像的代码是这样的:

\n
# Set the OCR libraries:\nfrom PIL import Image\nimport pyocr\nimport pyocr.builders\n\n# Set pyocr tools:\ntools = pyocr.get_available_tools()\n# The tools are returned in the recommended order of usage\ntool = tools[0]\n\n# Set OCR language:\nlangs = tool.get_available_languages()\nlang = langs[0]\n\n# Get string from image:\ntxt = tool.image_to_string(\n    Image.open(path + "closingImage.png"),\n    lang=lang,\n    builder=pyocr.builders.TextBuilder()\n)\n\nprint("Text is:"+txt)\n
Run Code Online (Sandbox Code Playgroud)\n

请注意,它OCR会在白色背景上接收黑色字符,因此您必须先反转图像。

\n