具有要由OCR识别的文本的Deblur图像

Question

具有要由OCR识别的文本的Deblur图像

Art*_*tur 9 c++ ocr opencv image-processing

我有一个模糊的图像:
这是名片的一部分,它是相机拍摄的框架之一,没有适当的焦点.

清晰的图像看起来像这样我正在寻找可以给我更好质量的图像的方法,这样图像可以被OCR识别,但也应该非常快.图像没有太多模糊(我想是这样),但对OCR不好.我试过了:

不同种类的HPF,
拉普拉斯,
Canny探测器,
形态操作的组合(开放,关闭).

我也尝试过:

使用维纳滤波器进行反卷积,
反卷积和Lucy-Richardson方法.

但要找到合适的PSF(点扩散函数)并不容易.这些方法被认为是有效的,但不够快.我也尝试过FFT,然后使用高斯模板进行IFFT,但结果并不令人满意.我正在寻找一种使用文本去除图像去模糊的一般方法,而不仅仅是这个图像.有人可以帮我解决这个问题吗？我会很感激任何建议.我正在使用OpenCV 3(C++,有时是Python).

Answer 1

Ali*_*Ali 16

你知道盲解卷积吗？

盲解卷积是一种众所周知的天文图像恢复技术。这对于很难找到 PSF 的应用程序特别有用。

这是该技术的一个 C++ 实现。这篇论文也与您正在寻找的内容非常相关。这是他们算法的示例输出：

Answer 2

M.I*_*nat 9

我最近也遇到了这个问题，并用更多细节和最近的方法提出了一个类似的问题。到目前为止，这似乎是一个未解决的问题。最近有一些研究工作试图通过深度学习解决此类问题。不幸的是，没有一件作品达到我们的预期。但是，我正在分享这些信息，以防它对任何人有所帮助。

1. 野外场景文本图像超分辨率

就我们而言，这可能是我们最后的选择；相对而言，表现足够好。这是最近的一项研究工作（TSRN），主要集中在此类案例上。它的主要直觉是引入超分辨率（SR）技术作为预处理。到目前为止，这种实现看起来是最有前途的。这是他们成就的说明，改善模糊以清洁图像。

2. 神经增强

从他们的repo演示来看，它似乎也可能有一些改善模糊文本的潜力。然而，作者可能在大约 4 年的时间里都没有维护这个 repo。

3. 使用 GAN 进行盲动去模糊

吸引人的部分是其中的Blind Motion Deblurring机制，名为DeblurGAN。它看起来很有希望。

4. 通过核估计和噪声注入实现真实世界的超分辨率

关于他们工作的一个有趣事实是，与其他文学作品不同，他们首先通过估计各种模糊核以及真实噪声分布，为真实世界的图像设计了一个新颖的退化框架。基于此，他们获取与真实世界图像共享公共域的LR图像。然后，他们提出了一个旨在更好感知的现实世界超分辨率模型。从他们的文章：

但是，在我的观察中，我无法得到预期的结果。我在github上提出了一个问题，直到现在没有得到任何回应。

用于直接文本去模糊的卷积神经网络

该论文是由@Ali共享看起来很有趣，结果是非常好的。很高兴他们分享了他们训练模型的预训练权重，还分享了 python 脚本以便于使用。但是，他们已经对Caffe库进行了试验。我更愿意转换为PyTorch以更好地控制。下面是提供的带有Caffe导入的Python 脚本。请注意，由于缺乏 Caffe 知识，直到现在我无法完全移植它，如果您知道，请纠正我。

from __future__ import print_function import numpy as np import os, sys, argparse, glob, time, cv2, Queue, caffe # Some Helper Functins def getCutout(image, x1, y1, x2, y2, border): assert(x1 >= 0 and y1 >= 0) assert(x2 > x1 and y2 >y1) assert(border >= 0) return cv2.getRectSubPix(image, (y2-y1 + 2*border, x2-x1 + 2*border), (((y2-1)+y1) / 2.0, ((x2-1)+x1) / 2.0)) def fillRndData(data, net): inputLayer = 'data' randomChannels = net.blobs[inputLayer].data.shape[1] rndData = np.random.randn(data.shape[0], randomChannels, data.shape[2], data.shape[3]).astype(np.float32) * 0.2 rndData[:,0:1,:,:] = data net.blobs[inputLayer].data[...] = rndData[:,0:1,:,:] def mkdirp(directory): if not os.path.isdir(directory): os.makedirs(directory)
Run Code Online (Sandbox Code Playgroud)
主函数从这里开始

def main(argv): pycaffe_dir = os.path.dirname(__file__) parser = argparse.ArgumentParser() # Optional arguments. parser.add_argument( "--model_def", help="Model definition file.", required=True ) parser.add_argument( "--pretrained_model", help="Trained model weights file.", required=True ) parser.add_argument( "--out_scale", help="Scale of the output image.", default=1.0, type=float ) parser.add_argument( "--output_path", help="Output path.", default='' ) parser.add_argument( "--tile_resolution", help="Resolution of processing tile.", required=True, type=int ) parser.add_argument( "--suffix", help="Suffix of the output file.", default="-deblur", ) parser.add_argument( "--gpu", action='store_true', help="Switch for gpu computation." ) parser.add_argument( "--grey_mean", action='store_true', help="Use grey mean RGB=127. Default is the VGG mean." ) parser.add_argument( "--use_mean", action='store_true', help="Use mean." ) parser.add_argument( "--adversarial", action='store_true', help="Use mean." ) args = parser.parse_args() mkdirp(args.output_path) if hasattr(caffe, 'set_mode_gpu'): if args.gpu: print('GPU mode', file=sys.stderr) caffe.set_mode_gpu() net = caffe.Net(args.model_def, args.pretrained_model, caffe.TEST) else: if args.gpu: print('GPU mode', file=sys.stderr) net = caffe.Net(args.model_def, args.pretrained_model, gpu=args.gpu) inputs = [line.strip() for line in sys.stdin] print("Classifying %d inputs." % len(inputs), file=sys.stderr) inputBlob = net.blobs.keys()[0] # [innat]: input shape outputBlob = net.blobs.keys()[-1] print( inputBlob, outputBlob) channelCount = net.blobs[inputBlob].data.shape[1] net.blobs[inputBlob].reshape(1, channelCount, args.tile_resolution, args.tile_resolution) net.reshape() if channelCount == 1 or channelCount > 3: color = 0 else: color = 1 outResolution = net.blobs[outputBlob].data.shape[2] inResolution = int(outResolution / args.out_scale) boundary = (net.blobs[inputBlob].data.shape[2] - inResolution) / 2 for fileName in inputs: img = cv2.imread(fileName, flags=color).astype(np.float32) original = np.copy(img) img = img.reshape(img.shape[0], img.shape[1], -1) if args.use_mean: if args.grey_mean or channelCount == 1: img -= 127 else: img[:,:,0] -= 103.939 img[:,:,1] -= 116.779 img[:,:,2] -= 123.68 img *= 0.004 outShape = [int(img.shape[0] * args.out_scale) , int(img.shape[1] * args.out_scale) , net.blobs[outputBlob].channels] imgOut = np.zeros(outShape) imageStartTime = time.time() for x, xOut in zip(range(0, img.shape[0], inResolution), range(0, imgOut.shape[0], outResolution)): for y, yOut in zip(range(0, img.shape[1], inResolution), range(0, imgOut.shape[1], outResolution)): start = time.time() region = getCutout(img, x, y, x+inResolution, y+inResolution, boundary) region = region.reshape(region.shape[0], region.shape[1], -1) data = region.transpose([2, 0, 1]).reshape(1, -1, region.shape[0], region.shape[1]) if args.adversarial: fillRndData(data, net) out = net.forward() else: out = net.forward_all(data=data) out = out[outputBlob].reshape(out[outputBlob].shape[1], out[outputBlob].shape[2], out[outputBlob].shape[3]).transpose(1, 2, 0) if imgOut.shape[2] == 3 or imgOut.shape[2] == 1: out /= 0.004 if args.use_mean: if args.grey_mean: out += 127 else: out[:,:,0] += 103.939 out[:,:,1] += 116.779 out[:,:,2] += 123.68 if out.shape[0] != outResolution: print("Warning: size of net output is %d px and it is expected to be %d px" % (out.shape[0], outResolution)) if out.shape[0] < outResolution: print("Error: size of net output is %d px and it is expected to be %d px" % (out.shape[0], outResolution)) exit() xRange = min((outResolution, imgOut.shape[0] - xOut)) yRange = min((outResolution, imgOut.shape[1] - yOut)) imgOut[xOut:xOut+xRange, yOut:yOut+yRange, :] = out[0:xRange, 0:yRange, :] imgOut[xOut:xOut+xRange, yOut:yOut+yRange, :] = out[0:xRange, 0:yRange, :] print(".", end="", file=sys.stderr) sys.stdout.flush() print(imgOut.min(), imgOut.max()) print("IMAGE DONE %s" % (time.time() - imageStartTime)) basename = os.path.basename(fileName) name = os.path.join(args.output_path, basename + args.suffix) print(name, imgOut.shape) cv2.imwrite( name, imgOut) if __name__ == '__main__': main(sys.argv)
Run Code Online (Sandbox Code Playgroud)
运行程序：

cat fileListToProcess.txt | python processWholeImage.py --model_def ./BMVC_nets/S14_19_200.deploy --pretrained_model ./BMVC_nets/S14_19_FQ_178000.model --output_path ./out/ --tile_resolution 300 --suffix _out.png --meangpu

权重文件和上述脚本可以从这里 (BMVC_net)下载。但是，您可能想要转换caffe2pytorch。为了做到这一点，这里是基本的起点：

安装原型镜头

克隆caffemodel2pytorch

下一个，

# BMVC_net, you need to download it from authors website, link above model = caffemodel2pytorch.Net( prototxt = './BMVC_net/S14_19_200.deploy', weights = './BMVC_net/S14_19_FQ_178000.model', caffe_proto = 'https://raw.githubusercontent.com/BVLC/caffe/master/src/caffe/proto/caffe.proto' ) model.cuda() model.eval() torch.set_grad_enabled(False)
Run Code Online (Sandbox Code Playgroud)
运行演示张量，

# make sure to have right procedure of image normalization and channel reordering image = torch.Tensor(8, 3, 98, 98).cuda() # outputs dict of PyTorch Variables # in this example the dict contains the only key "prob" #output_dict = model(data = image) # you can remove unneeded layers: #del model.prob #del model.fc8 # a single input variable is interpreted as an input blob named "data" # in this example the dict contains the only key "fc7" output_dict = model(image) # print(output_dict) print(output_dict.keys())
Run Code Online (Sandbox Code Playgroud)
请注意，有一些基本的事情需要考虑；网络期望 DPI 为 120-150 的文本、合理的方向和合理的黑白水平。网络期望从输入中减去 [103.9, 116.8, 123.7]。输入应进一步乘以 0.004。

归档时间：	7 年，10 月前
查看次数：	636 次
最近记录：	7 年，10 月前