pytesseract 无法从图像中识别复杂的数学公式

Question

pytesseract 无法从图像中识别复杂的数学公式

Sum*_*tel 3 algorithm image-processing python-3.x

我在 python 中使用pytesseract模块，pytesseract可以识别图像中的文本，但它不适用于包含复杂数学公式（如根、推导、积分数学问题或方程）的图像。

\n\n

代码2.py

\n\n

# Import modules\nfrom PIL import Image\nimport pytesseract\nimport cv2\n\n# Include tesseract executable in your path\npytesseract.pytesseract.tesseract_cmd = r"C:\\Program Files\\Tesseract-OCR\\tesseract.exe"\n\n# Create an image object of PIL library\nimage = Image.open(\'23.jpg\')\n\n# img = cv2.imread(\'123.jpg\')\n# pass image into pytesseract module\n\n# pytesseract is trained in many languages\nimage_to_text = pytesseract.image_to_string(image, lang=\'eng+equ\')\n\nimage_to_text1 = pytesseract.image_to_string(image)\n\n# Print the text\nprint(image_to_text)\n# print(image_to_text1)\n\n\n# workon digits\n

Run Code Online (Sandbox Code Playgroud)\n\n

输出：

\n\n

242/33\n2x\n\n2x+3X\n\n2X+3x=4\n\n2x?-3x +1=0\n(x-1)(x+1) =x2-1\n(x+2)/((x+3)(x-4))\n\n7-4=3\nV(x/2) =3\n\n2xx\xe2\x80\x94343=6x\xe2\x80\x943 (x#3)\n\nJeeta =e* +e\n\ndy 2\nS=2?-3\ndz \xc2\xa5\n\ndy = (a? \xe2\x80\x94 3)dx\n

Run Code Online (Sandbox Code Playgroud)\n\n

输入图像

\n

Answer 1

yvs*_*yvs 6

要使用 MATH 语言，您应该为 tesseract 安装正确的语言。在您的情况下，它是来自https://github.com/tesseract-ocr/tessdata/raw/3.04.00/equ.traineddata的“equ” 。可用语言的完整列表位于https://tesseract-ocr.github.io/tessdoc/Data-Files

我不熟悉 Windows 的 tesseract 语言安装。但有一个文档https://github.com/tesseract-ocr/tesseract/wiki：

如果您想使用其他语言，请下载适当的训练数据，使用 7-zip 解压，然后将 .traineddata 文件复制到“tessdata”目录中，可能是 C:\Program Files\Tesseract-OCR\tessdata

首先尝试仅使用 cli （不使用 pyhton ）处理图像，因为 cli 有完整的选项列表可供调整。

归档时间：	5 年，10 月前
查看次数：	5817 次
最近记录：	4 年，9 月前