it_*_*ure 6 python-3.x python-tesseract
请在此处下载附件并将其另存为/tmp/target.jpg
。
您可以看到0244R
jpg,i中包含以下python代码提取字符串:
from PIL import Image
import pytesseract
import cv2
filename = "/tmp/target.jpg"
image = cv2.imread(filename)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
ret, threshold = cv2.threshold(gray,55, 255, cv2.THRESH_BINARY)
print(pytesseract.image_to_string(threshold))
Run Code Online (Sandbox Code Playgroud)
我得到的是
0244K
Run Code Online (Sandbox Code Playgroud)
正确的字符串是 0244R
,如何使图像具有更高的对比度,灰度,然后使用PIL和pytesseract准确地获得所有字符?这是生成图像的网页:
如果对输入图像应用adaptive-thresholding
和运算,结果将是:bitwise-not
现在,如果您删除特殊字符,例如(点、逗号等..)
txt = pytesseract.image_to_string(bnt, config="--psm 6")
res = ''.join(i for i in txt if i.isalnum())
print(res)
Run Code Online (Sandbox Code Playgroud)
结果将是:
O244R
Run Code Online (Sandbox Code Playgroud)
代码:
import cv2
import pytesseract
img = cv2.imread("Aw6sN.jpg")
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
thr = cv2.adaptiveThreshold(gry, 255, cv2.ADAPTIVE_THRESH_MEAN_C,
cv2.THRESH_BINARY_INV, 23, 100)
bnt = cv2.bitwise_not(thr)
txt = pytesseract.image_to_string(bnt, config="--psm 6")
res = ''.join(i for i in txt if i.isalnum())
print(res)
Run Code Online (Sandbox Code Playgroud)