如何在Android中使用OpenCV + Tesseract进行准确的文本识别?

aro*_*rak 4 ocr android opencv tesseract

我正在尝试使用OpenCV(Android)来处理使用相机拍摄的图像,然后将其传递给Tesseract进行文本(数字)识别,但是直到图像非常(几乎没有噪音)才能获得良好的效果.目前我正在对拍摄的图像进行以下处理:1.应用高斯模糊.2.自适应阈值:对图像进行二值化.3.反转颜色使背景变黑.然后将处理后的图像传递给Tesseract.

但我没有取得好成绩.

请建议我在进入Tesseract之前或在Tesseract处理阶段进一步处理图像时可采取的步骤/措施.

另外,Android中还有其他更好的库吗?

Amm*_*CSE 10

You can isolate/detect characters in images. This can be done with powerful algorithms such as the Stroke Width Transform.

The following steps worked well with me:

  1. Obtain grayscale of image.
  2. Perform canny edge detection on grayscale image.
  3. Apply gaussian blur on grayscale image(store in seperate matrix)
  4. Input matrices from steps 2 & 3 into SWT algorithm
  5. Binarize(threshhold) resulting image.
  6. Feed image to tesseract.

Please note, for step 4 you will need to build the c++ library in the link and then import into your android project with JNI wrappers. Also, you will need to do micro tweaking for all steps to get the best results. But, this should at least get you started.