检测文本图像是否颠倒

sin*_*ium 35 python opencv image-rotation skew

我有几百张图像(扫描的文档),其中大多数是歪斜的。我想使用Python使它们偏斜。
这是我使用的代码:

import numpy as np
import cv2

from skimage.transform import radon


filename = 'path_to_filename'
# Load file, converting to grayscale
img = cv2.imread(filename)
I = cv2.cvtColor(img, COLOR_BGR2GRAY)
h, w = I.shape
# If the resolution is high, resize the image to reduce processing time.
if (w > 640):
    I = cv2.resize(I, (640, int((h / w) * 640)))
I = I - np.mean(I)  # Demean; make the brightness extend above and below zero
# Do the radon transform
sinogram = radon(I)
# Find the RMS value of each row and find "busiest" rotation,
# where the transform is lined up perfectly with the alternating dark
# text and white lines
r = np.array([np.sqrt(np.mean(np.abs(line) ** 2)) for line in sinogram.transpose()])
rotation = np.argmax(r)
print('Rotation: {:.2f} degrees'.format(90 - rotation))

# Rotate and save with the original resolution
M = cv2.getRotationMatrix2D((w/2,h/2),90 - rotation,1)
dst = cv2.warpAffine(img,M,(w,h))
cv2.imwrite('rotated.jpg', dst)
Run Code Online (Sandbox Code Playgroud)

该代码对大多数文档都适用,除了某些角度:(180和0)和(90和270)通常被检测为相同角度(即,它不会使(180和0)与(90和90之间产生差异)。 270))。所以我得到了很多颠倒的文件。

这是一个例子:
在此处输入图片说明

我得到的结果图像与输入图像相同。

是否有建议使用Opencv和Python检测图像是否颠倒?
PS:我尝试使用EXIF数据检查方向,但没有找到任何解决方案。


编辑:
可以使用Tesseract(Python的pytesseract)检测方向,但是仅当图像包含很多字符时才有可能。
对于可能需要此功能的任何人:

import cv2
import pytesseract


print(pytesseract.image_to_osd(cv2.imread(file_name)))
Run Code Online (Sandbox Code Playgroud)

如果文档包含足够的字符,则Tesseract可以检测方向。但是,当图像的线条很少时,Tesseract建议的定向角度通常是错误的。因此,这不是100%的解决方案。

Ste*_*hke 21

Python3 / OpenCV4脚本可对齐扫描的文档。

旋转文档并汇总行。当文档旋转0度和180度时,图像中会出现很多黑色像素:

旋转以找到最大的斑马

使用分数保持方法。对每个图像进行评分,以使其类似于斑马纹。得分最高的图像具有正确的旋转度。您链接的图像偏离了0.5度。为了便于阅读,我省略了一些功能,完整的代码可以在此处找到

# Rotate the image around in a circle
angle = 0
while angle <= 360:
    # Rotate the source image
    img = rotate(src, angle)    
    # Crop the center 1/3rd of the image (roi is filled with text)
    h,w = img.shape
    buffer = min(h, w) - int(min(h,w)/1.15)
    roi = img[int(h/2-buffer):int(h/2+buffer), int(w/2-buffer):int(w/2+buffer)]
    # Create background to draw transform on
    bg = np.zeros((buffer*2, buffer*2), np.uint8)
    # Compute the sums of the rows
    row_sums = sum_rows(roi)
    # High score --> Zebra stripes
    score = np.count_nonzero(row_sums)
    scores.append(score)
    # Image has best rotation
    if score <= min(scores):
        # Save the rotatied image
        print('found optimal rotation')
        best_rotation = img.copy()
    k = display_data(roi, row_sums, buffer)
    if k == 27: break
    # Increment angle and try again
    angle += .75
cv2.destroyAllWindows()
Run Code Online (Sandbox Code Playgroud)

最佳旋转

如何判断文件是否颠倒?填写从文档顶部到图像中第一个非黑色像素的区域。用黄色测量面积。面积最小的图像将是正面朝上的图像:

右边向上上下翻转

# Find the area from the top of page to top of image
_, bg = area_to_top_of_text(best_rotation.copy())
right_side_up = sum(sum(bg))
# Flip image and try again
best_rotation_flipped = rotate(best_rotation, 180)
_, bg = area_to_top_of_text(best_rotation_flipped.copy())
upside_down = sum(sum(bg))
# Check which area is larger
if right_side_up < upside_down: aligned_image = best_rotation
else: aligned_image = best_rotation_flipped
# Save aligned image
cv2.imwrite('/home/stephen/Desktop/best_rotation.png', 255-aligned_image)
cv2.destroyAllWindows()
Run Code Online (Sandbox Code Playgroud)

  • 对于颠倒检测,您可以利用以下事实:由于大写字母和诸如t,h,k之类的字母的频率而产生晕轮效应。在静止图像上方,光环在白色带下方。即,在白色条带之间切成两半的区域总和需要在两个总和中更亮。 (3认同)
  • 这是一个很好的答案。但是倒置检测可能会在每章的最后一页等处失败。我想您还可以对左右边距进行类似的分析,因为段落结尾的插入平均比段落开头更深。 (2认同)

Oba*_*bay 6

Assuming you did run the angle-correction already on the image, you can try the following to find out if it is flipped:

  1. Project the corrected image to the y-axis, so that you get a 'peak' for each line. Important: There are actually almost always two sub-peaks!
  2. Smooth this projection by convolving with a gaussian in order to get rid of fine structure, noise, etc.
  3. For each peak, check if the stronger sub-peak is on top or at the bottom.
  4. Calculate the fraction of peaks that have sub-peaks on the bottom side. This is your scalar value that gives you the confidence that the image is oriented correctly.

The peak finding in step 3 is done by finding sections with above average values. The sub-peaks are then found via argmax.

Here's a figure to illustrate the approach; A few lines of you example image

  • Blue: Original projection
  • Orange: smoothed projection
  • Horizontal line: average of the smoothed projection for the whole image.

bla

here's some code that does this:

import cv2
import numpy as np

# load image, convert to grayscale, threshold it at 127 and invert.
page = cv2.imread('Page.jpg')
page = cv2.cvtColor(page, cv2.COLOR_BGR2GRAY)
page = cv2.threshold(page, 127, 255, cv2.THRESH_BINARY_INV)[1]

# project the page to the side and smooth it with a gaussian
projection = np.sum(page, 1)
gaussian_filter = np.exp(-(np.arange(-3, 3, 0.1)**2))
gaussian_filter /= np.sum(gaussian_filter)
smooth = np.convolve(projection, gaussian_filter)

# find the pixel values where we expect lines to start and end
mask = smooth > np.average(smooth)
edges = np.convolve(mask, [1, -1])
line_starts = np.where(edges == 1)[0]
line_endings = np.where(edges == -1)[0]

# count lines with peaks on the lower side
lower_peaks = 0
for start, end in zip(line_starts, line_endings):
    line = smooth[start:end]
    if np.argmax(line) < len(line)/2:
        lower_peaks += 1

print(lower_peaks / len(line_starts))
Run Code Online (Sandbox Code Playgroud)

this prints 0.125 for the given image, so this is not oriented correctly and must be flipped.

Note that this approach might break badly if there are images or anything not organized in lines in the image (maybe math or pictures). Another problem would be too few lines, resulting in bad statistics.

Also different fonts might result in different distributions. You can try this on a few images and see if the approach works. I don't have enough data.