标签: pytesser

Pytesser设置角色白名单

有谁知道如何为Pytesseract设置角色白名单？我希望它只输出Az和0-9.这可能吗？我有以下内容:

img = Image.open('test.jpg')
result = pytesseract.image_to_string(img, config='-psm 6')

Run Code Online (Sandbox Code Playgroud)

我得到其他字符像/为1所以我想限制可能的字符的选项.

python ocr tesseract pytesser

Min*_*o10

2017 07-21

4
推荐指数

1
解决办法

8141
查看次数

在Python中使用OCR从图像中提取文本

我想从图像的特定区域提取文本，例如身份证中的名称和ID号。我要从中提取文本的ID卡是中文（中文ID卡）。我已经尝试过此代码，但是它只是提取了我不需要的地址和出生日期。我只需要姓名和身份证号码。

import cv2
from PIL import Image
import pytesseract
import argparse
import os

image = cv2.imread("E:/face.jpg")
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
gray = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
filename = "{}.png".format(os.getpid())
cv2.imwrite(filename,gray)

text = pytesseract.image_to_string(Image.open(filename), lang='chi_sim')
print(text)
os.remove(filename)

Run Code Online (Sandbox Code Playgroud)

我还附加了要从中提取文本的图像。我已尽我所知尝试过但未成功。任何帮助和指导将不胜感激。

python opencv tesseract python-tesseract pytesser

Teh*_*een

2018 07-11

4
推荐指数

1
解决办法

3549
查看次数

安装tesseract-ocr时出现gcc错误

我正在尝试在Mac上运行以下代码。

import Image
enter code here`import pytesseract
im = Image.open('test.png')
print pytesseract.image_to_string(im)

Run Code Online (Sandbox Code Playgroud)

从这里提出以下问题：pytesseract-没有此类文件或目录错误，我需要安装tesseract-ocr

但是，当我尝试点安装tesseract-ocr时，出现以下错误：

creating build/temp.macosx-10.5-x86_64-2.7
gcc -fno-strict-aliasing -I//anaconda/include -arch x86_64 -DNDEBUG -g
-fwrapv -O3 -Wall -Wstrict-prototypes -I//anaconda/include/python2.7 -c
tesseract_ocr.cpp -o build/temp.macosx-10.5-x86_64-2.7/tesseract_ocr.o
tesseract_ocr.cpp:264:10: 
fatal error: 'leptonica/allheaders.h' file not found #include "leptonica/allheaders.h"
     ^
1 error generated.
error: command 'gcc' failed with exit status 1

Run Code Online (Sandbox Code Playgroud)

我不知道该怎么办。

python tesseract python-tesseract pytesser

Jas*_*lam

2017 05-23

3
推荐指数

1
解决办法

5029
查看次数

导入pytesseract

我试图将pytesseract用于OCR（从图像中提取文本）。我已经使用以下命令成功安装了pytessearct-

pip install pytessearct

Run Code Online (Sandbox Code Playgroud)

当我尝试再次安装它时，它会清楚地说-

Requirement already satisfied (use --upgrade to upgrade): 
pytesseract in ./site-packages

Run Code Online (Sandbox Code Playgroud)

这意味着pytessearct已成功安装。当我尝试使用-在我的iPython笔记本中导入此软件包时-

import pytessearct

Run Code Online (Sandbox Code Playgroud)

引发错误-

ImportError: No module named pytesseract

Run Code Online (Sandbox Code Playgroud)

为什么会这样呢？

python pip ipython pytesser jupyter-notebook

Com*_*ata

2016 08-15

3
推荐指数

1
解决办法

8569
查看次数

无法找到 Tesseract 的 tessdata

嗨，我是 python 和 tesseract 的新手。我正在使用 anaconda 发行版，并尝试使用 pytesseract-ocr 当我尝试从图像获取数据时，它给出以下错误：

tesseract imageSample1.jpg test.txt digits
// output 
Tesseract Open Source OCR Engine v3.04.01 with Leptonica
Error opening data file /anaconda/envs/_build/share/tessdata/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
Could not initialize tesseract.

Run Code Online (Sandbox Code Playgroud)

所以首先没有这样的/anaconda/envs/_build/share/tessdata/目录。我有 anaconda3 文件夹。我从 git 下载了 end.traindata 。但不确定将该数据放在哪里。难道我做错了什么。需要一些帮助。谢谢。

python git pytesser

nil*_*ash

lucky-day

3
推荐指数

1
解决办法

8912
查看次数

Pytesseract，尝试检测屏幕上的文本

我将 MSS 与 pytesseract 结合使用，尝试在屏幕上读取以确定正在监视的区域中的字符串。我的代码如下：

import Image
import pytesseract
import cv2
import os
import mss
import numpy as np

with mss.mss() as sct:
    mon = {'top': 0, 'left': 0, 'width': 150, 'height': 150}

    im = sct.grab(mon)
    im = np.asarray(im)
    im_gray = cv2.cvtColor(im, cv2.COLOR_BGR2GRAY)
    #im_gray = plt.imshow(im_gray, interpolation='nearest')

    cv2.imwrite("test.png", im_gray)
    #cur_dir = os.getcwd()
    text = pytesseract.image_to_string(Image.open(im_gray))
    print(text)

    cv2.imshow("Image", im)
    cv2.imshow("Output", im_gray)
    cv2.waitKey(0)

Run Code Online (Sandbox Code Playgroud)

我返回以下错误： AttributeError: 'numpy.ndarray' object has no attribute 'read'

我还尝试使用 pyplot 将其转换回图像，如代码示例中的注释行所示。然而，这会打印回错误： TypeError: img is not a numpy array, isn't …

python numpy pytesser python-mss

Jus*_*tin

2018 05-07

3
推荐指数

1
解决办法

2万
查看次数

pytesseract 输出未定义

尝试在 python 上运行 tesseract，这是我的代码：

import cv2
import os
import numpy as np
import matplotlib.pyplot as plt
import pytesseract
import Image
# def main():
jpgCounter = 0
    for root, dirs, files in os.walk('/home/manel/Desktop/fotografias etiquetas'):
    for file in files:
        if file.endswith('.jpg'):
        jpgCounter += 1

for i in range(1, 2):

    name                = str(i) + ".jpg"
    nameBW              = str(i) + "_bw.jpg"
    img                 = cv2.imread(name,0) #zero -> abre em grayscale
    # img                 = cv2.equalizeHist(img)
    kernel = np.array([[0,-1,0], [-1,5,-1], [0,-1,0]])
    img = cv2.filter2D(img, -1, kernel)
    cv2.normalize(img,img,0,255,cv2.NORM_MINMAX) …

Run Code Online (Sandbox Code Playgroud)

python ubuntu tesseract python-tesseract pytesser

mbc*_*mbc

lucky-day

3
推荐指数

2
解决办法

5174
查看次数

pytesseract tessedit_char_whitelist 不接受报价

我已经开始在 python 中使用 pytesseract。当我将它传递给单引号或双引号时

from PIL import Image
import pytesseract
import numpy as np

tesseract_config = r"""-c tessedit_char_whitelist=0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ#'<>(){};:"""
tesseract_language = "eng"

text = pytesseract.image_to_string(Image.open('res/outc001.jpg'), lang=tesseract_language, config=tesseract_config)
print text

Run Code Online (Sandbox Code Playgroud)

它返回

Traceback (most recent call last):
  File "main.py", line 15, in <module>
    text = pytesseract.image_to_string(Image.open('res/outc001.jpg'), lang=tesseract_language, config=tesseract_config).split('\n')
  File "/usr/local/lib/python2.7/dist-packages/pytesseract/pytesseract.py", line 193, in image_to_string
    return run_and_get_output(image, 'txt', lang, config, nice)
  File "/usr/local/lib/python2.7/dist-packages/pytesseract/pytesseract.py", line 140, in run_and_get_output
    run_tesseract(**kwargs)
  File "/usr/local/lib/python2.7/dist-packages/pytesseract/pytesseract.py", line 106, in run_tesseract
    command += shlex.split(config)
  File "/usr/lib/python2.7/shlex.py", line 279, in split
    return …

Run Code Online (Sandbox Code Playgroud)

python quotes tesseract python-tesseract pytesser

Mix*_*ony

2018 03-31

3
推荐指数

1
解决办法

3483
查看次数

将pytesseract字符串输出转换为pandas df

我收到了赛百味全天的详细销售、工人等收据，需要为管理课程提取数据。

我拍了收据的照片，然后用 pytesseract 将它们处理成由 \n 分隔的字符串，但现在不知道如何使用 pd.read_csv 和 StringIO 将其转换为数据帧。如果这是最好的方法，请不要这样做。也可能需要使用 cv2 编辑图像，以便更好地处理。

import numpy as np
import pytesseract
from PIL import Image
import pandas as pd

path = 'C:\\attachments\\'

monday = pytesseract.image_to_string(Image.open(path+'file1-1.jpeg'),lang='eng')

from StringIO import StringIO
mon = pd.read_csv(StringIO(monday),sep=r'\s',lineterminator=r'\n')
print(mon)

Run Code Online (Sandbox Code Playgroud)

这是当前星期一的一些变量。

"\nTIME HOURS :\nPERIOD SALES UNITS WORKED PROD SPLH\nZhan emmoo «Ct (iti ;:t‘«é‘«‘i CSD\n3A-4A $0.00 0 0 0 $0.00\n44-54 =: $0.00 SssOO 0 0 $0.00\n5A-6A $0.00 0 0 0 $0.00\nbA-7A $0.00 0 0 0 $0.00\n7A-BA =s«$0.00-Sss«OOs«*O0.80 0 $0.00\nBA-9A 60,00 . …

Run Code Online (Sandbox Code Playgroud)

stringio python-imaging-library pandas pytesser

N.F*_*her

lucky-day

3
推荐指数

1
解决办法

4400
查看次数

在Pytesser中使用多种语言

我已经开始使用Pytesser，它可以同时使用英语和中文，但是有没有办法同时使用两种语言？我需要制作自己的训练数据文件吗？我的代码是：

import Image
from pytesser import *
print image_to_string(Image.open("chinese_and_english.jpg"), lang="eng")
#also want to have chinese be recognized

Run Code Online (Sandbox Code Playgroud)

python ocr tesseract python-tesseract pytesser

Dav*_*Lin

lucky-day

2
推荐指数

1
解决办法

4933
查看次数

图片到文字python

我正在使用python 3.x并使用以下代码将图像转换为文本：

from PIL import Image
from pytesseract import image_to_string

image = Image.open('image.png', mode='r')
print(image_to_string(image))

Run Code Online (Sandbox Code Playgroud)

我收到以下错误：

Traceback (most recent call last):
  File "C:/Users/hp/Desktop/GII/Image_to_text.py", line 12, in <module>
    print(image_to_string(image))
  File "C:\Users\hp\Downloads\WinPython-64bit-3.5.1.2\python-3.5.1.amd64\lib\site-packages\pytesseract\pytesseract.py", line 161, in image_to_string
    config=config)
  File "C:\Users\hp\Downloads\WinPython-64bit-3.5.1.2\python-3.5.1.amd64\lib\site-packages\pytesseract\pytesseract.py", line 94, in run_tesseract
    stderr=subprocess.PIPE)
  File "C:\Users\hp\Downloads\WinPython-64bit-3.5.1.2\python-3.5.1.amd64\lib\subprocess.py", line 950, in __init__
    restore_signals, start_new_session)
  File "C:\Users\hp\Downloads\WinPython-64bit-3.5.1.2\python-3.5.1.amd64\lib\subprocess.py", line 1220, in _execute_child
    startupinfo)
FileNotFoundError: [WinError 2] The system cannot find the file specified

Run Code Online (Sandbox Code Playgroud)

请注意，我已将图像放在存在python的同一目录中。也不会引发错误，image = Image.open('image.png', mode='r')但会引发错误 print(image_to_string(image))。

知道这里可能有什么问题吗？谢谢

python python-3.x pytesser

mua*_*aiz

lucky-day

2
推荐指数

1
解决办法

2万
查看次数

Pytesseract 转换期间出现“ValueError：无法过滤调色板图像”

遇到有关 Pytesseract 的以下代码的此错误代码的问题。（Python 3.6.1，Mac OSX）

从 PIL 导入 pytesseract 导入请求从 PIL 导入图像从 io 导入 ImageFilter 导入 StringIO, BytesIO

def process_image(url):
    image = _get_image(url)
    image.filter(ImageFilter.SHARPEN)
    return pytesseract.image_to_string(image)


def _get_image(url):
    r = requests.get(url)
    s = BytesIO(r.content)
    img = Image.open(s)
    return img

process_image("https://www.prepressure.com/images/fonts_sample_ocra_medium.png")

Run Code Online (Sandbox Code Playgroud)

错误：

/usr/local/Cellar/python3/3.6.0_1/Frameworks/Python.framework/Versions/3.6/bin/python3.6 /Users/g/pyfo/reddit/ocr.py
Traceback (most recent call last):
  File "/Users/g/pyfo/reddit/ocr.py", line 20, in <module>
    process_image("https://www.prepressure.com/images/fonts_sample_ocra_medium.png")
  File "/Users/g/pyfo/reddit/ocr.py", line 10, in process_image
    image.filter(ImageFilter.SHARPEN)
  File "/usr/local/lib/python3.6/site-packages/PIL/Image.py", line 1094, in filter
    return self._new(filter.filter(self.im))
  File "/usr/local/lib/python3.6/site-packages/PIL/ImageFilter.py", line 53, in filter
    raise ValueError("cannot …

Run Code Online (Sandbox Code Playgroud)

python ocr tesseract typeerror pytesser

gmo*_*onz

2017 04-07

2
推荐指数

1
解决办法

1628
查看次数

pytesseract和image.tif文件

我需要使用pytesseract将具有多个页面的image.tif转录为文本。我有下一个代码：

> From PIL import Image
> Import pytesseract
> Pytesseract.pytesseract.tesseract_cmd = 'C: / Program Files (x86) / Tesseract-
> OCR / tesseract '
> Print (pytesseract.image_to_string (Image.open ('CAMARA.tif'), lang = "spa"))

Run Code Online (Sandbox Code Playgroud)

问题在于只能提取冷杉页面。我如何提取所有这些？

python python-tesseract pytesser

And*_*rés

lucky-day

2
推荐指数

1
解决办法

1403
查看次数