无法使用 pytesseract.image_to_string 从 Image 读取文本

Question

无法使用 pytesseract.image_to_string 从 Image 读取文本

vis*_*219 2 python captcha opencv python-imaging-library python-tesseract

这里的问题是我需要删除行并编写代码来识别字符。到目前为止，我已经看到了解决方案，其中 char 是实心的，但它具有带双边框的 char。

Answer 1

Han*_*rse 5

对于这个特定的验证码，有一个非常简单的解决方案。但是，由于评论中已经提到的验证码的“性质”，并且通常在处理具有有限提供的输入数据的图像处理任务时，无法保证这种方法适用于其他甚至非常相似的验证码。

将图像读取为灰度。
在接近白色截止时对图像进行阈值处理。
洪水用黑色填充“背景”。
pytesseract带-psm 6选项运行。

那将是整个代码：

import cv2
import pytesseract

# Read image as grayscale
img = cv2.imread('FuZEJ.png', cv2.IMREAD_GRAYSCALE)

# Threshold at nearly white cutoff
thr = cv2.threshold(img, 224, 255, cv2.THRESH_BINARY)[1]

# Floodfill "background" with black
ff = cv2.floodFill(thr, None, (0, 0), 0)[1]

# OCR using pytesseract
text = pytesseract.image_to_string(ff, config='--psm 6').replace('\n', '').replace('\f', '')
print(text)
# xwphs

Run Code Online (Sandbox Code Playgroud)

警告：我使用曼海姆大学图书馆的 Tesseract 的特殊版本。

----------------------------------------
System information
----------------------------------------
Platform:      Windows-10-10.0.16299-SP0
Python:        3.9.1
PyCharm:       2021.1.1
OpenCV:        4.5.1
pytesseract:   5.0.0-alpha.20201127
----------------------------------------

Run Code Online (Sandbox Code Playgroud)

归档时间：	4 年，8 月前
查看次数：	183 次
最近记录：	4 年，8 月前