如何用汉字打印tesseract结果

Question

如何用汉字打印tesseract结果

我正在尝试让我的程序使用 Tesseract 识别中文，并且它有效。我遇到的唯一问题是，不是将结果打印为汉字，而是用拼音打印结果（如何将中文单词输入为英文）。

# Import libraries
from PIL import Image
import pytesseract
from unidecode import unidecode

pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"

image_counter = 2

filelimit = image_counter - 1

outfile = "out_text.txt"

f = open(outfile, "a")

for i in range(1, filelimit + 1):
    print("ran")
    filename = "page_" + str(i) + ".png"

    # Recognize the text as string in image using pytesserct
    text = unidecode(((pytesseract.image_to_string(Image.open(filename), lang = "chi_sim"))))

    print(text)

Run Code Online (Sandbox Code Playgroud)

这是我跑的图像

这就是我得到的

ran Qing Ming Shi Jie Yu Fen Fen , Lu Shang Xing Ren Yu Duan Que Xin Wen Jiu Jia He Chu You , Mu Yi Tong Zhi Qiang Hua Cun .

结果应该是汉字，如图所示。

Answer 1

Bol*_*wic 5

没关系，我意识到了我的问题。

text = unidecode(((pytesseract.image_to_string(Image.open(filename), lang = "chi_sim"))))

Run Code Online (Sandbox Code Playgroud)

应该

text = pytesseract.image_to_string(Image.open(filename), lang = "chi_tra")

Run Code Online (Sandbox Code Playgroud)

归档时间：	6 年，2 月前
查看次数：	5743 次
最近记录：	6 年，2 月前