如何在图像中的文本区域周围制作边界框？（即使文字是歪斜的！！）

Question

如何在图像中的文本区域周围制作边界框？（即使文字是歪斜的！！）

Tat*_*dia 3 opencv imagemagick bounding-box python-tesseract google-vision

我试图从任何消费产品的广告截取的屏幕截图中检测和抓取文本。

我的代码以一定的精度工作，但无法在倾斜的文本区域周围制作边界框。

最近我尝试了Google Vision API，它在几乎所有可能的文本区域周围制作边界框，并非常准确地检测该区域中的文本。我很好奇我怎样才能达到相同或相似！

我的测试图像：

边界框后的 Google Vision API：

先感谢您：）

Answer 1

Fle*_*n-X 10

有一些开源视觉包能够检测嘈杂背景图像中的文本，可与 Google 的 Vision API 相媲美。

您可以使用由 Zhou 等人称为 EAST（高效且准确的场景文本检测器）的固定卷积层简单架构。 https://arxiv.org/abs/1704.03155v2

使用 Python：

从以下位置下载预训练模型：https : //www.dropbox.com/s/r2ingd0l3zt8hxs/frozen_east_text_detection.tar.gz?dl=1。将模型解压缩到您的当前文件夹。

您将需要 OpenCV >= 3.4.2 来执行以下命令。

import cv2
import math
net = cv2.dnn.readNet("frozen_east_text_detection.pb")   #This is the model we get after extraction
frame = cv2.imread(<image_filename>)
inpWidth = inpHeight = 320  # A default dimension
# Preparing a blob to pass the image through the neural network
# Subtracting mean values used while training the model.
image_blob = cv2.dnn.blobFromImage(frame, 1.0, (inpWidth, inpHeight), (123.68, 116.78, 103.94), True, False)

Run Code Online (Sandbox Code Playgroud)

现在我们必须定义输出层，它会生成检测到的文本的位置值及其置信度分数（通过 Sigmoid 函数）

output_layer = []
output_layer.append("feature_fusion/Conv_7/Sigmoid")
output_layer.append("feature_fusion/concat_3")

Run Code Online (Sandbox Code Playgroud)

最后，我们将通过网络进行前向传播以获得所需的输出。

net.setInput(image_blob)
output = net.forward(output_layer)
scores = output[0]
geometry = output[1]

Run Code Online (Sandbox Code Playgroud)

在这里，我使用了 opencv 的 github 页面https://github.com/opencv/opencv/blob/master/samples/dnn/text_detection.py 中定义的解码函数将位置值转换为框坐标。（第 23 至 75 行）。

对于框检测阈值，我使用了 0.5，对于非最大抑制，我使用了 0.3。您可以尝试不同的值来获得更好的边界框。

confThreshold = 0.5
nmsThreshold = 0.3
[boxes, confidences] = decode(scores, geometry, confThreshold)
indices = cv2.dnn.NMSBoxesRotated(boxes, confidences, confThreshold, nmsThreshold)

Run Code Online (Sandbox Code Playgroud)

最后，将框覆盖在图像中检测到的文本上：

height_ = frame.shape[0]
width_ = frame.shape[1]
rW = width_ / float(inpWidth)
rH = height_ / float(inpHeight)

for i in indices:
    # get 4 corners of the rotated rect
    vertices = cv2.boxPoints(boxes[i[0]])
    # scale the bounding box coordinates based on the respective ratios
    for j in range(4):
        vertices[j][0] *= rW
        vertices[j][1] *= rH
    for j in range(4):
        p1 = (vertices[j][0], vertices[j][1])
        p2 = (vertices[(j + 1) % 4][0], vertices[(j + 1) % 4][1])
        cv2.line(frame, p1, p2, (0, 255, 0), 3)

# To save the image:
cv2.imwrite("maggi_boxed.jpg", frame)

Run Code Online (Sandbox Code Playgroud)

我没有尝试过不同的阈值。更改它们肯定会产生更好的结果，并且还可以消除将徽标误分类为文本的情况。

注意：该模型是在英语语料库上训练的，因此不会检测到印地语单词。您也可以阅读概述测试数据集的论文。

归档时间：	7 年前
查看次数：	6391 次
最近记录：	5 年，9 月前