ser*_*nte 18 python opencv image-processing computer-vision
我的妈妈不时要通过这些类型的照片来提取图像中的数字并将其重命名为数字.
我正在尝试使用OpenCV,Python,Tesseract来完成这个过程.我真的迷失了尝试用数字提取图像的部分.我怎么能这样做?任何建议我都是OpenCV的新手.
我尝试使用阈值和轮廓提取白色矩形板,但没有用,因为我选择用于打谷的RGB并不总是有效,我不知道如何选择轮廓.
编辑:
查看本文http://yoni.wexlers.org/papers/2010TextDetection.pdf.看起来很有意思
我一直在看这个,并且沿途有几个灵感......
Tesseract可以接受自定义词典,如果再多挖一点,看起来从v3.0开始,它接受命令行参数digits
使其仅识别数字 - 这似乎是一个有用的想法,满足您的需求.
可能没有必要找到带有数字的电路板 - 使用图像的各个切片多次运行Tesseract可能更容易,并让它自己尝试,因为它应该是它应该做的.
所以,我决定通过改变25%黑色到纯黑色的所有东西来预处理图像,其他一切都变成纯白色.这给出了这样的预处理图像:
接下来,我生成一系列图像并将它们传递给Tesseract.我决定假设数字可能在图像高度的40%到10%之间,所以我在图像高度的40,30,20和10%的条带上做了一个循环.然后,我将条带从上到下以20个步骤向下滑动图像,将每个条带传递到Tesseract,直到条带基本上穿过图像的底部.
这是40%的条带 - 动画的每个帧都传递给Tesseract:
这是20%的条带 - 动画的每个帧都传递给Tesseract:
拿到条带后,我很好地调整了它们的尺寸,以获得Tesseract的最佳位置,并将它们从噪音等处清理干净.然后,我将它们传递给Tesseract并通过计算它找到的位数来评估识别的质量,有些粗略.最后,我按数字排序输出 - 可能更多的数字可能更好......
有一些粗糙的边缘和位你可以随意,但它是一个开始!
#!/bin/bash
image=${1-c1.jpg}
# Make everything that is nearly black go fully black, everything else goes white. Median for noise
# convert -delay 500 c1.jpg c2.jpg c3.jpg -normalize -fuzz 25% -fill black -opaque black -fuzz 0 -fill white +opaque black -median 9 out.gif
convert "${image}" -normalize \
-fuzz 25% -fill black -opaque black \
-fuzz 0 -fill white +opaque black \
-median 9 tmp_$$.png
# Get height of image - h
h=$(identify -format "%h" "${image}")
# Generate strips that are 40%, 30%, 20% and 10% of image height
for pc in 40 30 20 10; do
# Calculate height of this strip in pixels - sh
((sh=(h*pc)/100))
# Calculate offset from top of picture to top of bottom strip - omax
((omax=h-sh))
# Calculate step size, there will be 20 steps
((step=omax/20))
# Cut strips sh pixels high from the picture starting at top and working down in 20 steps
for (( off=0;off<$omax;off+=$step)) do
t=$(printf "%05d" $off)
# Extract strip and resize to 80 pixels tall for tesseract
convert tmp_$$.png -crop x${sh}+0+${off} \
-resize x80 -median 3 -median 3 -median 3 \
-threshold 90% +repage slice_${pc}_${t}.png
# Run slice through tesseract, seeking only digits
tesseract slice_${pc}_${t}.png temp digits quiet
# Now try and assess quality of output :-) ... by counting number of digits
digits=$(tr -cd "[0-9]" < temp.txt)
ndigits=${#digits}
[ $ndigits -gt 0 ] && [ $ndigits -lt 6 ] && echo $ndigits:$digits
done
done | sort -n
Run Code Online (Sandbox Code Playgroud)
Cow 618的输出(第一个数字是找到的位数)
2:11
2:11
3:573
5:33613 <--- not bad
Run Code Online (Sandbox Code Playgroud)
Cow 2755的输出(第一个数字是找到的位数)
2:51
3:071
3:191
3:517
4:2155 <--- pretty close
4:2755 <--- nailed that puppy :-)
4:2755 <--- nailed that puppy :-)
4:5212
5:12755 <--- pretty close
Run Code Online (Sandbox Code Playgroud)
Cow 3174的输出(第一个数字是找到的位数)
3:554
3:734
5:12732
5:31741 <--- pretty close
Run Code Online (Sandbox Code Playgroud)
很酷的问题 - 谢谢!
使用PIL(Python Imaging Library),您可以轻松加载图像并对其进行处理.使用灰度转换,您可以将RGB转换为灰度,这应该更容易检测级别.如果要对图像进行阈值处理(以检测白板),可以使用point()函数来映射颜色.
另一方面,你可以编写一个简单的程序,它可以让你
这应该会促进这个过程!使用TkInter,PyGTK,PyQt或其他一些窗口工具包来编写这个应该相对容易.
编辑:我需要一个类似的程序来分类图像 - 虽然没有OCR他们.所以我最终决定这是一个很好的时间,并做了第一次尝试(使用OCR!).在尝试之前备份你的图像! 快速手册:
这是pre-alpha程序:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
# test_pil.py
#
# Copyright 2015 John Coppens <john@jcoppens.com>
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
# MA 02110-1301, USA.
#
#
import pygtk
import gtk
import glob
import os.path as osp
from os import rename
import re
import subprocess as sp
temp_image = "/tmp/test_pil.png"
image_re = """\.(?:jpe?g|png|gif)$"""
class RecognizeDigits():
def __init__(self):
pass
def process(self, img, x0, y0, x1, y1):
""" Receive the gtk.Image, and the limits of the selected area (in
window coordinates!)
Call Tesseract on the area, and give the possibility to edit the
result.
Returns None if NO is pressed, and the OCR'd (and edited) text if OK
"""
pb = img.get_pixbuf().subpixbuf(x0, y0, x1-x0, y1-y0)
pb.save(temp_image, "png")
out = sp.check_output(("tesseract", temp_image, "stdout", "-psm 7", "digits"))
out = out.replace(" ", "").strip()
dlg = gtk.MessageDialog(type = gtk.MESSAGE_QUESTION,
flags = gtk.DIALOG_MODAL,
buttons = gtk.BUTTONS_YES_NO,
message_format = "The number read is:")
entry = gtk.Entry()
entry.set_text(out)
dlg.get_message_area().pack_start(entry)
entry.show()
response = dlg.run()
nr = entry.get_text()
dlg.destroy()
if response == gtk.RESPONSE_YES:
return nr
else:
return None
class FileSelector(gtk.VBox):
""" Provides a folder selector (at the top) and a list of files in the
selected folder. On selecting a file, the FileSelector calls the
function provided to the constructor (image_viewer)
"""
def __init__(self, image_viewer):
gtk.VBox.__init__(self)
self.image_viewer = image_viewer
fc = gtk.FileChooserButton('Select a folder')
fc.set_action(gtk.FILE_CHOOSER_ACTION_SELECT_FOLDER)
fc.connect("selection-changed", self.on_file_set)
self.pack_start(fc, expand = False, fill = True)
self.tstore = gtk.ListStore(str)
self.tview = gtk.TreeView(self.tstore)
self.tsel = self.tview.get_selection()
self.tsel.connect("changed", self.on_selection_changed)
renderer = gtk.CellRendererText()
col = gtk.TreeViewColumn(None, renderer, text = 0)
self.tview.append_column(col)
scrw = gtk.ScrolledWindow()
scrw.add(self.tview)
self.pack_start(scrw, expand = True, fill = True)
def on_file_set(self, fcb):
self.tstore.clear()
self.imgdir = fcb.get_filename()
for f in glob.glob(self.imgdir + "/*"):
if re.search(image_re, f):
self.tstore.append([osp.basename(f)])
def on_selection_changed(self, sel):
model, itr = sel.get_selected()
if itr != None:
base = model.get(itr, 0)
fname = self.imgdir + "/" + base[0]
self.image_viewer(fname)
class Status(gtk.Table):
""" Small status window which shows the coordinates for of the area
selected in the image
"""
def __init__(self):
gtk.Table.__init__(self)
self.attach(gtk.Label("X"), 1, 2, 0, 1, yoptions = gtk.FILL)
self.attach(gtk.Label("Y"), 2, 3, 0, 1, yoptions = gtk.FILL)
self.attach(gtk.Label("Top left:"), 0, 1, 1, 2, yoptions = gtk.FILL)
self.attach(gtk.Label("Bottom right:"), 0, 1, 2, 3, yoptions = gtk.FILL)
self.entries = {}
for coord in (("x0", 1, 2, 1, 2), ("y0", 2, 3, 1, 2),
("x1", 1, 2, 2, 3), ("y1", 2, 3, 2, 3)):
self.entries[coord[0]] = gtk.Entry()
self.entries[coord[0]].set_width_chars(6)
self.attach(self.entries[coord[0]],
coord[1], coord[2], coord[3], coord[4],
yoptions = gtk.FILL)
def set_top_left(self, x0, y0):
self.x0 = x0
self.y0 = y0
self.entries["x0"].set_text(str(x0))
self.entries["y0"].set_text(str(y0))
def set_bottom_right(self, x1, y1):
self.x1 = x1
self.y1 = y1
self.entries["x1"].set_text(str(x1))
self.entries["y1"].set_text(str(y1))
class ImageViewer(gtk.ScrolledWindow):
""" Provides a scrollwindow to move the image around. It also detects
button press and release events (left button), will call status
to update the coordinates, and will call task on button release
"""
def __init__(self, status, task = None):
gtk.ScrolledWindow.__init__(self)
self.task = task
self.status = status
self.drawing = False
self.prev_rect = None
self.viewport = gtk.Viewport()
self.viewport.connect("button-press-event", self.on_button_pressed)
self.viewport.connect("button-release-event", self.on_button_released)
self.viewport.set_events(gtk.gdk.BUTTON_PRESS_MASK | \
gtk.gdk.BUTTON_RELEASE_MASK)
self.img = gtk.Image()
self.viewport.add(self.img)
self.add(self.viewport)
def set_image(self, fname):
self.imagename = fname
self.img.set_from_file(fname)
def on_button_pressed(self, viewport, event):
if event.button == 1: # Left button: Select rectangle start
#self.x0, self.y0 = self.translate_coordinates(self.img, int(event.x), int(event.y))
self.x0, self.y0 = int(event.x), int(event.y)
self.status.set_top_left(self.x0, self.y0)
self.drawing = True
def on_button_released(self, viewport, event):
if event.button == 1: # Right button: Select rectangle end
#self.x1, self.y1 = self.translate_coordinates(self.img, int(event.x), int(event.y))
self.x1, self.y1 = int(event.x), int(event.y)
self.status.set_bottom_right(self.x1, self.y1)
if self.task != None:
res = self.task().process(self.img, self.x0, self.y0, self.x1, self.y1)
if res == None: return
newname = osp.split(self.imagename)[0] + '/' + res + ".jpeg"
rename(self.imagename, newname)
print "Renaming ", self.imagename, newname
class MainWindow(gtk.Window):
def __init__(self):
gtk.Window.__init__(self)
self.connect("delete-event", self.on_delete_event)
self.set_size_request(600, 300)
grid = gtk.Table()
# Image selector
files = FileSelector(self.update_image)
grid.attach(files, 0, 1, 0, 1,
yoptions = gtk.FILL | gtk.EXPAND, xoptions = gtk.FILL)
# Some status information
self.status = Status()
grid.attach(self.status, 0, 1, 1, 2,
yoptions = gtk.FILL, xoptions = gtk.FILL)
# The image viewer
self.viewer = ImageViewer(self.status, RecognizeDigits)
grid.attach(self.viewer, 1, 2, 0, 2)
self.add(grid)
self.show_all()
def update_image(self, fname):
self.viewer.set_image(fname)
def on_delete_event(self, wdg, data):
gtk.main_quit()
def run(self):
gtk.mainloop()
def main():
mw = MainWindow()
mw.run()
return 0
if __name__ == '__main__':
main()
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
3657 次 |
最近记录: |