图片中的文字是否加粗?

che*_*der 5 python ocr tesseract

我一直在用 Tesseract OCR 试验 latley。我能够在图像中找到字符,但我无法仅找到图像中的粗体字符(知道文档图像中的字符是否为粗体)。我在 Tesseract API 中看到了另一个问题(我可以使用 OCR 来检测字体样式(粗体、斜体)吗?)中提到的函数 WordFontAttributes() 但我无法在 Python 中实现它。

小智 0

在安装 tesseract 3.05 之前(第 4 版不支持 WordFontAttributes)

from tesserocr import PyTessBaseAPI, RIL, iterate_level


def get_words_info(image_path, tessdata_path):
    """
    get path to image and path to tessdata and return dict with info about each word
    """
    # api = PyTessBaseAPI(path=tessdata_path)
    with PyTessBaseAPI(path=tessdata_path) as api:
        api.SetImageFile(image_path)
        api.Recognize()
        iter = api.GetIterator()
        level = RIL.WORD

        result = []

        for r in iterate_level(iter, level):
            element = r.GetUTF8Text(level)
            word_attributes = r.WordFontAttributes()
            base_line = r.BoundingBox(level)

            if element:
                word_attributes['word'] = element
                word_attributes['position'] = base_line

            result.append(word_attributes)

        return result
Run Code Online (Sandbox Code Playgroud)