使用超立方体检测明亮背景上的白色文本

Question

使用超立方体检测明亮背景上的白色文本

Jon*_*han 3 java opencv tesseract tess4j pokemon-go

我在阅读明亮背景上的白色文本时遇到问题，它找到文本本身，但无法真正正确翻译它。

图片：

老实说，我不断得到的结果LanEerus并不遥远。

我想知道什么图像预处理可以解决这个问题？在尝试使用代码执行此操作之前，我使用 Photoshop 手动对其进行预处理，以找到首先应该起作用的内容。

我尝试将其设为位图，但这使得文本的边框非常糟糕，导致超正方只是将其转换为随机字符。

反转颜色和/或灰度似乎也不起作用。

有人有主意吗？我知道对于本案的文本来说，这是一个非常糟糕的背景。相信我，我希望背景有所不同！

我的测试代码：

File file = new File("C:\\tess\\lando.png");
ITesseract tess = new Tesseract();
tess.setDatapath("tessdata");

System.out.println(tess.doOCR(file));

Run Code Online (Sandbox Code Playgroud)

编辑
我已通读《提高质量》，但无法使这些技巧发挥作用。

编辑2
使用OpenCV对图像进行灰度、反转颜色、高斯模糊和自适应阈值预处理后。我得到了图像的结果，但没有更好的阅读。如果有的话，更糟..

Answer 1

sta*_*ine 9

这是一种可能的解决方案。这是用 Python 编写的，但对于 Java 移植来说应该足够清楚了。我们将应用一种称为增益除法的方法。这个想法是，您尝试构建背景模型，然后根据该模型对每个输入像素进行加权。在图像的大部分时间里，输出增益应该相对恒定。这将消除大部分背景颜色变化。我们可以使用morphological链来稍微清理一下结果，让我们看一下代码：

# imports:
import cv2
import numpy as np
# OCR imports:
from PIL import Image
import pyocr
import pyocr.builders

# image path
path = "D://opencvImages//"
fileName = "c552h.png"

# Reading an image in default mode:
inputImage = cv2.imread(path + fileName)

# Get local maximum:
kernelSize = 5
maxKernel = cv2.getStructuringElement(cv2.MORPH_RECT, (kernelSize, kernelSize))
localMax = cv2.morphologyEx(inputImage, cv2.MORPH_CLOSE, maxKernel, None, None, 1, cv2.BORDER_REFLECT101)

# Perform gain division
gainDivision = np.where(localMax == 0, 0, (inputImage/localMax))

# Clip the values to [0,255]
gainDivision = np.clip((255 * gainDivision), 0, 255)

# Convert the mat type from float to uint8:
gainDivision = gainDivision.astype("uint8")

Run Code Online (Sandbox Code Playgroud)

第一步是应用增益除法，您需要的操作很简单：closing具有大矩形的形态structuring element和一些数据类型转换，请小心后面的操作。这是应用该方法后您应该看到的图像：

太棒了，背景几乎消失了。让我们使用 Otsu 阈值来获取二值图像：

# Convert RGB to grayscale:
grayscaleImage = cv2.cvtColor(gainDivision, cv2.COLOR_BGR2GRAY)

# Get binary image via Otsu:
_, binaryImage = cv2.threshold(grayscaleImage, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)

Run Code Online (Sandbox Code Playgroud)

这是二值图像：

我们有一个很好的文本边缘图像。Flood-Fill如果我们将背景设置为白色，我们可以得到黑色背景和白色文本。但是，我们应该小心字符，因为如果字符被破坏，Flood-Fill操作会将其删除。首先，我们通过应用形态学来确保我们的角色是闭合的closing：

# Set kernel (structuring element) size:
kernelSize = 3
# Set morph operation iterations:
opIterations = 1

# Get the structuring element:
morphKernel = cv2.getStructuringElement(cv2.MORPH_RECT, (kernelSize, kernelSize))

# Perform closing:
binaryImage = cv2.morphologyEx( binaryImage, cv2.MORPH_CLOSE, morphKernel, None, None, opIterations, cv2.BORDER_REFLECT101 )

Run Code Online (Sandbox Code Playgroud)

这是生成的图像：

正如您所看到的，边缘更加坚固，而且最重要的是，边缘是封闭的。现在，我们可以将Flood-Fill背景设置为白色。这里，Flood-Fill种子点位于图像原点 ( x = 0, y = 0)：

# Flood fill (white + black):
cv2.floodFill(binaryImage, mask=None, seedPoint=(int(0), int(0)), newVal=(255))

Run Code Online (Sandbox Code Playgroud)

我们得到这个图像：

我们就快到了。正如您所看到的，某些字符（例如“a”、“d”和“o”）内部的空洞没有被填充——这可能会给OCR. 让我们尝试填充它们。我们可以利用这些孔都是父轮廓的子轮廓这一事实。我们可以隔离子轮廓，并再次应用 aFlood-Fill来填充它们。但首先，不要忘记反转图像：

# Invert image so target blobs are colored in white:
binaryImage = 255 - binaryImage

# Find the blobs on the binary image:
contours, hierarchy = cv2.findContours(binaryImage, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)

# Process the contours:
for i, c in enumerate(contours):

    # Get contour hierarchy:
    currentHierarchy = hierarchy[0][i][3]

    # Look only for children contours (the holes):
    if currentHierarchy != -1:

        # Get the contour bounding rectangle:
        boundRect = cv2.boundingRect(c)

        # Get the dimensions of the bounding rect:
        rectX = boundRect[0]
        rectY = boundRect[1]
        rectWidth = boundRect[2]
        rectHeight = boundRect[3]

        # Get the center of the contour the will act as
        # seed point to the Flood-Filling:
        fx = rectX + 0.5 * rectWidth
        fy = rectY + 0.5 * rectHeight

        # Fill the hole:
        cv2.floodFill(binaryImage, mask=None, seedPoint=(int(fx), int(fy)), newVal=(0))

# Write result to disk:
cv2.imwrite("text.png", binaryImage, [cv2.IMWRITE_PNG_COMPRESSION, 0])

Run Code Online (Sandbox Code Playgroud)

这是生成的掩码：

太酷了，让我们应用OCR. 我在用着pyocr：

txt = tool.image_to_string(
    Image.open("text.png"),
    lang=lang,
    builder=pyocr.builders.TextBuilder()
)

print(txt)

Run Code Online (Sandbox Code Playgroud)

输出：

Landorus

Run Code Online (Sandbox Code Playgroud)

归档时间：	4 年，8 月前
查看次数：	3677 次
最近记录：	4 年，8 月前