如何检测文本是否旋转 180 度或上下翻转

Joe*_*Joe 6 python opencv tesseract

我正在研究一个文本识别项目。文本有可能旋转 180 度。我在终端上尝试过 tesseract-ocr,但没有成功。有什么方法可以检测并纠正吗?文本示例如下所示。

在此输入图像描述

tesseract input.png output
Run Code Online (Sandbox Code Playgroud)

nat*_*ncy 2

检测文本是否旋转 180 度的一种简单方法是利用文本倾向于向底部倾斜的观察结果。策略如下:

  • 将图像转换为灰度
  • 高斯模糊
  • 阈值图像
  • 找到阈值图像的上/下半部 ROI
  • 计算每一半的非零数组元素

阈值图像

在此输入图像描述

查找上半部分和下半部分的 ROI

在此输入图像描述

在此输入图像描述

接下来我们分割顶部/底部部分

在此输入图像描述

对于每一半,我们使用 来计算非零数组元素cv2.countNonZero()。我们得到这个

('top', 4035)
('bottom', 3389)
Run Code Online (Sandbox Code Playgroud)

通过比较两半之间的值,如果上半部分的像素比下半部分多,则上下颠倒了 180 度。 如果它较少,则方向正确。

现在我们已经检测到它是否颠倒了,我们可以使用此函数旋转它

def rotate(image, angle):
    # Obtain the dimensions of the image
    (height, width) = image.shape[:2]
    (cX, cY) = (width / 2, height / 2)

    # Grab the rotation components of the matrix
    matrix = cv2.getRotationMatrix2D((cX, cY), -angle, 1.0)
    cos = np.abs(matrix[0, 0])
    sin = np.abs(matrix[0, 1])

    # Find the new bounding dimensions of the image
    new_width = int((height * sin) + (width * cos))
    new_height = int((height * cos) + (width * sin))

    # Adjust the rotation matrix to take into account translation
    matrix[0, 2] += (new_width / 2) - cX
    matrix[1, 2] += (new_height / 2) - cY

    # Perform the actual rotation and return the image
    return cv2.warpAffine(image, matrix, (new_width, new_height))
Run Code Online (Sandbox Code Playgroud)

旋转图像

rotated = rotate(original_image, 180)
cv2.imshow("rotated", rotated)
Run Code Online (Sandbox Code Playgroud)

这给了我们正确的结果

在此输入图像描述

这是图像方向正确时的像素结果

('top', 3209)
('bottom', 4206)
Run Code Online (Sandbox Code Playgroud)

完整代码

import numpy as np
import cv2

def rotate(image, angle):
    # Obtain the dimensions of the image
    (height, width) = image.shape[:2]
    (cX, cY) = (width / 2, height / 2)

    # Grab the rotation components of the matrix
    matrix = cv2.getRotationMatrix2D((cX, cY), -angle, 1.0)
    cos = np.abs(matrix[0, 0])
    sin = np.abs(matrix[0, 1])

    # Find the new bounding dimensions of the image
    new_width = int((height * sin) + (width * cos))
    new_height = int((height * cos) + (width * sin))

    # Adjust the rotation matrix to take into account translation
    matrix[0, 2] += (new_width / 2) - cX
    matrix[1, 2] += (new_height / 2) - cY

    # Perform the actual rotation and return the image
    return cv2.warpAffine(image, matrix, (new_width, new_height))

image = cv2.imread("1.PNG")
original_image = image.copy()
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blurred = cv2.GaussianBlur(gray, (3,3), 0)
thresh = cv2.threshold(blurred, 110, 255, cv2.THRESH_BINARY_INV)[1]
cv2.imshow("thresh", thresh)

x, y, w, h = 0, 0, image.shape[1], image.shape[0]

top_half = ((x,y), (x+w, y+h/2))
bottom_half = ((x,y+h/2), (x+w, y+h))

top_x1,top_y1 = top_half[0]
top_x2,top_y2 = top_half[1]
bottom_x1,bottom_y1 = bottom_half[0]
bottom_x2,bottom_y2 = bottom_half[1]

# Split into top/bottom ROIs
top_image = thresh[top_y1:top_y2, top_x1:top_x2]
bottom_image = thresh[bottom_y1:bottom_y2, bottom_x1:bottom_x2]

cv2.imshow("top_image", top_image)
cv2.imshow("bottom_image", bottom_image)

# Count non-zero array elements
top_pixels = cv2.countNonZero(top_image)
bottom_pixels = cv2.countNonZero(bottom_image)

print('top', top_pixels)
print('bottom', bottom_pixels)

# Rotate if upside down
if top_pixels > bottom_pixels:
    rotated = rotate(original_image, 180)
    cv2.imshow("rotated", rotated)

cv2.waitKey(0)
Run Code Online (Sandbox Code Playgroud)