使用 Python 和 OpenCV 检测 OCR 中的字间空间

Question

使用 Python 和 OpenCV 检测 OCR 中的字间空间

我是 Python 和 OpenCV 的新手。我目前正在使用 Python 和 OpenCV 进行 OCR，而不使用 Tesseract。到目前为止，我已经成功检测文本（字符和数字），但我遇到了检测单词之间空格的问题。例如-如果图像显示“Hello John”，那么它会检测到 hello john，但无法检测到它们之间的空格，因此我的输出是“ HelloJohn ”，它们之间没有任何空格。我用于提取轮廓的代码如下所示（我已导入所有所需模块，这是提取轮廓的主要模块）：

 imgGray = cv2.cvtColor(imgTrainingNumbers, cv2.COLOR_BGR2GRAY)
 imgBlurred = cv2.GaussianBlur(imgGray, (5,5), 0)                        


 imgThresh = cv2.adaptiveThreshold(imgBlurred,                           
                                  255,                                  
                                  cv2.ADAPTIVE_THRESH_GAUSSIAN_C,       
                                  cv2.THRESH_BINARY_INV,                
                                  11,                                   
                                  2)                                    

 cv2.imshow("imgThresh", imgThresh)      

 imgThreshCopy = imgThresh.copy()        

 imgContours, npaContours, npaHierarchy = cv2.findContours(imgThreshCopy,        
                                             cv2.RETR_EXTERNAL,                 
                                             cv2.CHAIN_APPROX_SIMPLE)

Run Code Online (Sandbox Code Playgroud)

之后，我对提取的数字和字符轮廓进行分类。请帮我检测它们之间的空间。预先感谢您，您的回复将非常有帮助。

Answer 1

alk*_*asm 6

由于您没有提供任何示例图像，我只是生成了一个简单的图像来测试：

h, w = 100, 600
img = np.zeros((h, w), dtype=np.uint8)
font = cv2.FONT_HERSHEY_SIMPLEX
cv2.putText(img, 'OCR with OpenCV', (30, h-30), font, 2, 255, 2, cv2.LINE_AA)

Run Code Online (Sandbox Code Playgroud)

正如我在评论中提到的，如果您只是放大图像，那么白色区域就会扩大。如果您使用足够大的内核来执行此操作，以便附近的字母合并，但又足够小以防止单独的单词合并，那么您将能够提取每个单词的轮廓，并使用它一次屏蔽一个单词以用于 OCR 目的。

kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (15, 15))
dilated = cv2.dilate(img, kernel)

Run Code Online (Sandbox Code Playgroud)

要单独获取每个单词的掩码，只需找到这些较大斑点的轮廓即可。您也可以对轮廓进行排序；垂直、水平或两者兼而有之，以便您按照正确的顺序获得单词。由于我只有一行，因此我将按以下方向排序x：

contours = cv2.findContours(dilated, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)[1]
contours = sorted(contours, key=lambda c: min(min(c[:, :, 0])))

for i in range(len(contours)):

    mask = np.zeros((h, w), dtype=np.uint8)

    # i is the contour to draw, -1 means fill the contours
    mask = cv2.drawContours(mask, contours, i, 255, -1)
    masked_img = cv2.bitwise_and(img, img, mask=mask)

    cv2.imshow('Masked single word', masked_img)
    cv2.waitKey()

    # do your OCR here on the masked image

Run Code Online (Sandbox Code Playgroud)

归档时间：	8 年，2 月前
查看次数：	3066 次
最近记录：	8 年，2 月前