OpenCV - 从调查问卷中检测复选框的手写标记

Question

OpenCV - 从调查问卷中检测复选框的手写标记

Lin*_*ing 5 python opencv image-processing computer-vision omr

我正在制作大量患者入院调查问卷。这是调查问卷的扫描示例。我需要处理它们并将其存储到数据库中，但是在检测这些手写标记时遇到了问题：

\n\n

患者入院问卷

\n\n

患者入院问卷

\n\n

问卷中有不同类型的标记。有些复选框被漆成黑色。有些复选框带有勾号或十字标记。这些标记都意味着复选框已被选中。我需要使用 opencv2 来识别选中了哪些框。

\n\n

我尝试过光学字符识别，但结果并没有真正的帮助。标记的形状太多，因此 OCR 将它们识别为不同的字符。我需要弄清楚调查问卷中勾选了哪些框。cv2 本来可以解决这个问题，但我不知道。

\n\n

# Expected input: An image of Questionnaire\n\n# Expected output:\nHave you seen other health care providers for your problems of dizziness \nand/or imbalance? [selected] Yes [unselected] No\n\nHave you been through a program of Vestibular and Balance Rehabilitation \nTherapy? [selected] Yes [unselected] No\n\n=============================\n[unselected] vertigo\n[unselected] falling\n...\n[selected] Drunk-like\n\n=============================\n[selected] Vertigo\n[selected] Falling\n[selected] Fatigue\n[selected] Wooziness\n[selected] Spinning\n[unselected] Disconnected\n\n

Run Code Online (Sandbox Code Playgroud)\n\n

我之前尝试使用 Python tesseract OCR 包：

\n\n

# Expected input: An image of Questionnaire\n\n# Expected output:\nHave you seen other health care providers for your problems of dizziness \nand/or imbalance? [selected] Yes [unselected] No\n\nHave you been through a program of Vestibular and Balance Rehabilitation \nTherapy? [selected] Yes [unselected] No\n\n=============================\n[unselected] vertigo\n[unselected] falling\n...\n[selected] Drunk-like\n\n=============================\n[selected] Vertigo\n[selected] Falling\n[selected] Fatigue\n[selected] Wooziness\n[selected] Spinning\n[unselected] Disconnected\n\n

Run Code Online (Sandbox Code Playgroud)\n\n

O Vertigo           O Falling              O Fatigue                 W Vertigo          YA Falling             y[ Fatigue\n[ Wooziness     O Spinning         O Disconnected       A \\Wooziness     Q Spinning         [ Disconnected\nO Imbalance      B Drunk-like        O Swirling             O Imbalance      O Drunk-like       @ Swirling      ;\nO Faint            [ Rocking        O Can\xe2\x80\x99tfocus         M Faint           4 Rocking          O Can\xe2\x80\x99t focus\nO Lightheaded O Swaying -~ . -0 Unsteady       O Lightheaded O Swaying       N Unsteady\nO \xe2\x80\x9conaboat\xe2\x80\x9d O Swimming sensation                      Weonaboat\xe2\x80\x9d @ Swimming sensation\nO Other:                                                        0 Other:\n

Run Code Online (Sandbox Code Playgroud)\n\n

我的想法是：如果 OCR 将矩形复选框识别为字符“O”或数字“0”，则应取消选中该复选框。否则应选择。根据该规则，我可以根据 OCR 结果检测手写标记。我将测试一些样本并查看精度，尽管我不确定这是否可行。如果是这样，我稍后会向这篇文章报告。

\n

Answer 1

小智 1

从示例中可以看出，黑色在标记的复选框区域中占据主导地位。您可以使用 OCR 通过检测文本来确定复选框区域的方向（假设您扫描的位置永远不精确），我建议您只计算复选框区域中像素值的平均值（不必只要将其与区域大小进行平均，就可以 100% 精确）。

归档时间：	6 年，7 月前
查看次数：	3117 次
最近记录：	5 年，2 月前