从图像中提取牛数

ser*_*nte 18 python opencv image-processing computer-vision

我的妈妈不时要通过这些类型的照片来提取图像中的数字并将其重命名为数字. 在此输入图像描述 在此输入图像描述 在此输入图像描述

我正在尝试使用OpenCV,Python,Tesseract来完成这个过程.我真的迷失了尝试用数字提取图像的部分.我怎么能这样做?任何建议我都是OpenCV的新手.

我尝试使用阈值和轮廓提取白色矩形板,但没有用,因为我选择用于打谷的RGB并不总是有效,我不知道如何选择轮廓.

编辑:

查看本文http://yoni.wexlers.org/papers/2010TextDetection.pdf.看起来很有意思

Mar*_*ell 7

我一直在看这个,并且沿途有几个灵感......

  1. Tesseract可以接受自定义词典,如果再多挖一点,看起来从v3.0开始,它接受命令行参数digits使其仅识别数字 - 这似乎是一个有用的想法,满足您的需求.

  2. 可能没有必要找到带有数字的电路板 - 使用图像的各个切片多次运行Tesseract可能更容易,并让它自己尝试,因为它应该是它应该做的.

所以,我决定通过改变25%黑色到纯黑色的所有东西来预处理图像,其他一切都变成纯白色.这给出了这样的预处理图像:

在此输入图像描述

在此输入图像描述

在此输入图像描述

接下来,我生成一系列图像并将它们传递给Tesseract.我决定假设数字可能在图像高度的40%到10%之间,所以我在图像高度的40,30,20和10%的条带上做了一个循环.然后,我将条带从上到下以20个步骤向下滑动图像,将每个条带传递到Tesseract,直到条带基本上穿过图像的底部.

这是40%的条带 - 动画的每个帧都传递给Tesseract:

在此输入图像描述

这是20%的条带 - 动画的每个帧都传递给Tesseract:

在此输入图像描述

拿到条带后,我很好地调整了它们的尺寸,以获得Tesseract的最佳位置,并将它们从噪音等处清理干净.然后,我将它们传递给Tesseract并通过计算它找到的位数来评估识别的质量,有些粗略.最后,我按数字排序输出 - 可能更多的数字可能更好......

有一些粗糙的边缘和位你可以随意,但它是一个开始!

#!/bin/bash
image=${1-c1.jpg}

# Make everything that is nearly black go fully black, everything else goes white. Median for noise
# convert -delay 500 c1.jpg c2.jpg c3.jpg -normalize -fuzz 25% -fill black -opaque black -fuzz 0 -fill white +opaque black -median 9 out.gif
   convert "${image}" -normalize \
       -fuzz 25% -fill black -opaque black \
       -fuzz 0   -fill white +opaque black \
       -median 9 tmp_$$.png 

# Get height of image - h
h=$(identify -format "%h" "${image}")

# Generate strips that are 40%, 30%, 20% and 10% of image height
for pc in 40 30 20 10; do
   # Calculate height of this strip in pixels - sh
   ((sh=(h*pc)/100))
   # Calculate offset from top of picture to top of bottom strip - omax
   ((omax=h-sh))
   # Calculate step size, there will be 20 steps
   ((step=omax/20))

   # Cut strips sh pixels high from the picture starting at top and working down in 20 steps
   for (( off=0;off<$omax;off+=$step)) do
      t=$(printf "%05d" $off)
      # Extract strip and resize to 80 pixels tall for tesseract
      convert tmp_$$.png -crop x${sh}+0+${off}      \
          -resize x80 -median 3 -median 3 -median 3 \
          -threshold 90% +repage slice_${pc}_${t}.png

      # Run slice through tesseract, seeking only digits
      tesseract slice_${pc}_${t}.png temp digits quiet

      # Now try and assess quality of output :-) ... by counting number of digits
      digits=$(tr -cd "[0-9]" < temp.txt)
      ndigits=${#digits}
      [ $ndigits -gt 0 ] && [ $ndigits -lt 6 ] && echo $ndigits:$digits
   done
done | sort -n
Run Code Online (Sandbox Code Playgroud)

Cow 618的输出(第一个数字是找到的位数)

2:11
2:11
3:573
5:33613    <--- not bad
Run Code Online (Sandbox Code Playgroud)

Cow 2755的输出(第一个数字是找到的位数)

2:51
3:071
3:191
3:517
4:2155   <--- pretty close
4:2755   <--- nailed that puppy :-)
4:2755   <--- nailed that puppy :-)
4:5212
5:12755  <--- pretty close
Run Code Online (Sandbox Code Playgroud)

Cow 3174的输出(第一个数字是找到的位数)

3:554
3:734
5:12732
5:31741  <--- pretty close
Run Code Online (Sandbox Code Playgroud)

很酷的问题 - 谢谢!


jco*_*ens 5

使用PIL(Python Imaging Library),您可以轻松加载图像并对其进行处理.使用灰度转换,您可以将RGB转换为灰度,这应该更容易检测级别.如果要对图像进行阈值处理(以检测白板),可以使用point()函数来映射颜色.

另一方面,你可以编写一个简单的程序,它可以让你

  • 选择,然后显示图像
  • 标记板的区域
  • 裁剪图像
  • 申请tesseract或其他什么,
  • 使用检测到的数字保存图像

这应该会促进这个过程!使用TkInter,PyGTK,PyQt或其他一些窗口工具包来编写这个应该相对容易.

编辑:我需要一个类似的程序来分类图像 - 虽然没有OCR他们.所以我最终决定这是一个很好的时间,并做了第一次尝试(使用OCR!).在尝试之前备份你的图像! 快速手册:

  • 左上角:选择工作文件夹,如果文件夹中有任何图像,则会出现图像列表.
  • 选择图像.用数字选择图像区域.坐标将出现在左下角,程序将调用Tesseract.
  • 编辑 - 如有必要 - 对话框中的数字OCRd.
  • 单击"确定"接受 - 将重命名图像.

这是pre-alpha程序:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
#  test_pil.py
#  
#  Copyright 2015 John Coppens <john@jcoppens.com>
#  
#  This program is free software; you can redistribute it and/or modify
#  it under the terms of the GNU General Public License as published by
#  the Free Software Foundation; either version 2 of the License, or
#  (at your option) any later version.
#  
#  This program is distributed in the hope that it will be useful,
#  but WITHOUT ANY WARRANTY; without even the implied warranty of
#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#  GNU General Public License for more details.
#  
#  You should have received a copy of the GNU General Public License
#  along with this program; if not, write to the Free Software
#  Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
#  MA 02110-1301, USA.
#  
#  

import pygtk
import gtk
import glob
import os.path as osp
from os import rename
import re
import subprocess as sp

temp_image = "/tmp/test_pil.png"
image_re = """\.(?:jpe?g|png|gif)$"""

class RecognizeDigits():
    def __init__(self):
        pass

    def process(self, img, x0, y0, x1, y1):
        """ Receive the gtk.Image, and the limits of the selected area (in
            window coordinates!)
            Call Tesseract on the area, and give the possibility to  edit the
            result.
            Returns None if NO is pressed, and the OCR'd (and edited) text if OK
        """
        pb = img.get_pixbuf().subpixbuf(x0, y0, x1-x0, y1-y0)
        pb.save(temp_image, "png")

        out = sp.check_output(("tesseract", temp_image, "stdout", "-psm 7", "digits"))
        out = out.replace(" ", "").strip()

        dlg = gtk.MessageDialog(type = gtk.MESSAGE_QUESTION,
                                flags = gtk.DIALOG_MODAL,
                                buttons = gtk.BUTTONS_YES_NO,
                                message_format = "The number read is:")
        entry = gtk.Entry()
        entry.set_text(out)
        dlg.get_message_area().pack_start(entry)
        entry.show()
        response = dlg.run()
        nr = entry.get_text()

        dlg.destroy()

        if response == gtk.RESPONSE_YES:
            return nr
        else:
            return None

class FileSelector(gtk.VBox):
    """ Provides a folder selector (at the top) and a list of files in the
        selected folder. On selecting a file, the FileSelector calls the
        function provided to the constructor (image_viewer)
    """
    def __init__(self, image_viewer):
        gtk.VBox.__init__(self)
        self.image_viewer = image_viewer

        fc = gtk.FileChooserButton('Select a folder')
        fc.set_action(gtk.FILE_CHOOSER_ACTION_SELECT_FOLDER)
        fc.connect("selection-changed", self.on_file_set)
        self.pack_start(fc, expand = False, fill = True)

        self.tstore = gtk.ListStore(str)
        self.tview = gtk.TreeView(self.tstore)
        self.tsel = self.tview.get_selection()
        self.tsel.connect("changed", self.on_selection_changed)
        renderer = gtk.CellRendererText()
        col = gtk.TreeViewColumn(None, renderer, text = 0)
        self.tview.append_column(col)

        scrw = gtk.ScrolledWindow()
        scrw.add(self.tview)
        self.pack_start(scrw, expand = True, fill = True)

    def on_file_set(self, fcb):
        self.tstore.clear()
        self.imgdir = fcb.get_filename()
        for f in glob.glob(self.imgdir + "/*"):
            if re.search(image_re, f):
                self.tstore.append([osp.basename(f)])

    def on_selection_changed(self, sel):
        model, itr = sel.get_selected()
        if itr != None:
            base = model.get(itr, 0)
            fname = self.imgdir + "/" + base[0]
            self.image_viewer(fname)

class Status(gtk.Table):
    """ Small status window which shows the coordinates for of the area
        selected in the image
    """
    def __init__(self):
        gtk.Table.__init__(self)

        self.attach(gtk.Label("X"), 1, 2, 0, 1, yoptions = gtk.FILL)
        self.attach(gtk.Label("Y"), 2, 3, 0, 1, yoptions = gtk.FILL)
        self.attach(gtk.Label("Top left:"), 0, 1, 1, 2, yoptions = gtk.FILL)
        self.attach(gtk.Label("Bottom right:"), 0, 1, 2, 3, yoptions = gtk.FILL)

        self.entries = {}
        for coord in (("x0", 1, 2, 1, 2), ("y0", 2, 3, 1, 2),
                      ("x1", 1, 2, 2, 3), ("y1", 2, 3, 2, 3)):
            self.entries[coord[0]] = gtk.Entry()
            self.entries[coord[0]].set_width_chars(6)
            self.attach(self.entries[coord[0]],
                                     coord[1], coord[2], coord[3], coord[4],
                                     yoptions = gtk.FILL)

    def set_top_left(self, x0, y0):
        self.x0 = x0
        self.y0 = y0
        self.entries["x0"].set_text(str(x0))
        self.entries["y0"].set_text(str(y0))

    def set_bottom_right(self, x1, y1):
        self.x1 = x1
        self.y1 = y1
        self.entries["x1"].set_text(str(x1))
        self.entries["y1"].set_text(str(y1))

class ImageViewer(gtk.ScrolledWindow):
    """ Provides a scrollwindow to move the image around. It also detects
        button press and release events (left button), will call status
        to update the coordinates, and will call task on button release
    """
    def __init__(self, status, task = None):
        gtk.ScrolledWindow.__init__(self)

        self.task = task
        self.status = status
        self.drawing = False
        self.prev_rect = None

        self.viewport = gtk.Viewport()
        self.viewport.connect("button-press-event", self.on_button_pressed)
        self.viewport.connect("button-release-event", self.on_button_released)
        self.viewport.set_events(gtk.gdk.BUTTON_PRESS_MASK | \
                                 gtk.gdk.BUTTON_RELEASE_MASK)

        self.img = gtk.Image()
        self.viewport.add(self.img)
        self.add(self.viewport)

    def set_image(self, fname):
        self.imagename = fname
        self.img.set_from_file(fname)

    def on_button_pressed(self, viewport, event):
        if event.button == 1:       # Left button: Select rectangle start
            #self.x0, self.y0 = self.translate_coordinates(self.img, int(event.x), int(event.y))
            self.x0, self.y0 = int(event.x), int(event.y)
            self.status.set_top_left(self.x0, self.y0)
            self.drawing = True

    def on_button_released(self, viewport, event):
        if event.button == 1:       # Right button: Select rectangle end
            #self.x1, self.y1 = self.translate_coordinates(self.img, int(event.x), int(event.y))
            self.x1, self.y1 = int(event.x), int(event.y)
            self.status.set_bottom_right(self.x1, self.y1)
            if self.task != None:
                res = self.task().process(self.img, self.x0, self.y0, self.x1, self.y1)

                if res == None: return

                newname = osp.split(self.imagename)[0] + '/' + res + ".jpeg"
                rename(self.imagename, newname)
                print "Renaming ", self.imagename, newname

class MainWindow(gtk.Window):
    def __init__(self):
        gtk.Window.__init__(self)
        self.connect("delete-event", self.on_delete_event)
        self.set_size_request(600, 300)

        grid = gtk.Table()

        # Image selector
        files = FileSelector(self.update_image)
        grid.attach(files, 0, 1, 0, 1,
                           yoptions = gtk.FILL | gtk.EXPAND, xoptions = gtk.FILL)

        # Some status information
        self.status = Status()
        grid.attach(self.status, 0, 1, 1, 2,
                                 yoptions = gtk.FILL, xoptions = gtk.FILL)

        # The image viewer
        self.viewer = ImageViewer(self.status, RecognizeDigits)
        grid.attach(self.viewer, 1, 2, 0, 2)
        self.add(grid)

        self.show_all()

    def update_image(self, fname):
        self.viewer.set_image(fname)

    def on_delete_event(self, wdg, data):
        gtk.main_quit()

    def run(self):
        gtk.mainloop()

def main():
    mw = MainWindow()
    mw.run()
    return 0

if __name__ == '__main__':
    main()
Run Code Online (Sandbox Code Playgroud)