Tesseract 对阿拉伯语单词/字母不返回任何内容

Question

Tesseract 对阿拉伯语单词/字母不返回任何内容

我已经安装了 Pytesseract，它可以完美地处理法语/英语文本以及数字。但是当我尝试阅读任何阿拉伯文本/字母时，它不会返回任何内容。

\n\n

这是我使用过的代码：

\n\n

try:\n    from PIL import Image\nexcept ImportError:\n    import Image\nimport pytesseract\n\npytesseract.pytesseract.tesseract_cmd = r"C:\\Program Files (x86)\\Tesseract-OCR\\tesseract.exe"\n\nprint(pytesseract.image_to_string(Image.open(\'maroc.jpg\'), lang=\'ara\'))\n

Run Code Online (Sandbox Code Playgroud)\n\n

这是我想读的信\xd8\xaf：

\n\n

$\xd8\xaf$

\n\n

如果有人能够使用其他方法阅读它，请帮忙，谢谢！

\n

Answer 1

Eli*_* KL 5

代码：

from pytesseract import image_to_string 
from PIL import Image
import pytesseract

print(pytesseract.image_to_pdf_or_hocr('test.png', lang='ara', extension='hocr'))

Run Code Online (Sandbox Code Playgroud)

从这里获取新的阿拉伯语 tessdata ：

归档时间：	6 年，8 月前
查看次数：	8658 次
最近记录：	1 年，8 月前