我正在尝试让我的程序使用 Tesseract 识别中文,并且它有效。我遇到的唯一问题是,不是将结果打印为汉字,而是用拼音打印结果(如何将中文单词输入为英文)。
# Import libraries
from PIL import Image
import pytesseract
from unidecode import unidecode
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
image_counter = 2
filelimit = image_counter - 1
outfile = "out_text.txt"
f = open(outfile, "a")
for i in range(1, filelimit + 1):
print("ran")
filename = "page_" + str(i) + ".png"
# Recognize the text as string in image using pytesserct
text = unidecode(((pytesseract.image_to_string(Image.open(filename), lang = "chi_sim"))))
print(text)
Run Code Online (Sandbox Code Playgroud)
这是我跑的图像
这就是我得到的
ran
Qing Ming Shi Jie Yu Fen Fen , Lu Shang Xing Ren Yu Duan Que
Xin Wen Jiu Jia He Chu You , Mu Yi Tong Zhi Qiang Hua Cun .
结果应该是汉字,如图所示。
没关系,我意识到了我的问题。
text = unidecode(((pytesseract.image_to_string(Image.open(filename), lang = "chi_sim"))))
Run Code Online (Sandbox Code Playgroud)
应该
text = pytesseract.image_to_string(Image.open(filename), lang = "chi_tra")
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
5743 次 |
| 最近记录: |