Par*_*eog 1 python opencv computer-vision
我有两种类型的切片,一种有6个复选框,另一种有5种.
我的方法(效果不好)
我使用了图像的平均值np.mean(image)并设置了一个阈值(140),这样如果值大于那个值,那么图像有六个复选框,否则它有五个.这种方法背后的想法是,在我看来,具有六个复选框的切片具有比具有五个复选框的切片更多的黑色像素.
题
所以,我的问题是,我还能做什么才能获得准确的分类?我正在使用Python 3.6和OpenCV,所以使用这些的一些解决方案将不胜感激.
可选地,虽然我没有这种数据来运行深度学习过程.我很想知道深度学习是否也有帮助.
谢谢.
编辑
忘记提到这一点,我也试图找到轮廓和形状(正方形和矩形),但它们不一致,因为分辨率低,因为盒子上也可能有刻度线.我得到2-3个盒子,但这还不足以告诉我差异
在几次尝试失败之后,以下方法似乎在提供的输入数据集上产生令人满意的结果.
在第一次检查时,我注意到所有的样本图像都是相同的形状,所以我可以很容易地堆叠它们.我开始观察包含垂直堆叠的所有输入图像的图像(使用numpy.vstack)
我做了以下观察:
使用图像编辑器,我确定以下掩码是对复选框位置的一个很好的估计:
或者,在Python代码中,显示每个区域的第一列/最后一列:
# Define the zones (x axis ranges) where checkboxes may occur
zones_a = [(50, 72), (144, 166), (243, 265), (328, 350), (436, 458)] # 5 box scenario
zones_b = [(42, 64), (122, 144), (207, 229), (276, 298), (369, 391), (496, 518)] # 6 box scanario
Run Code Online (Sandbox Code Playgroud)
考虑到这一点,我得出了以下方法:
为了演示,我会选择其中一个讨厌的:
首先,我将其作为灰度图像阅读
img = cv2.imread(filename, cv2.IMREAD_GRAYSCALE)
Run Code Online (Sandbox Code Playgroud)
并且将其二值化 - 具有相当大的块大小的自适应阈值似乎在保留相关细节的同时很好地消除了大部分噪声(即使在这种情况下仍然存在大量不期望的垃圾)
thresh = cv2.adaptiveThreshold(img, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 15, 2)
Run Code Online (Sandbox Code Playgroud)
注意:因为在这一点上,我们正在处理白色的黑色文字,其意义erode和dilate反转 - 侵蚀扩大黑色部分,扩大减少它们.(一旦你理解了这个主题,这很直观)
接下来,我尝试使用形态学操作来强调垂直边缘
thresh = cv2.morphologyEx(thresh, cv2.MORPH_ERODE, np.ones((1,3),np.uint8))
Run Code Online (Sandbox Code Playgroud)
然后去强调水平边缘(包括大部分文本)
thresh = cv2.morphologyEx(thresh, cv2.MORPH_DILATE, np.ones((3,1),np.uint8))
Run Code Online (Sandbox Code Playgroud)
下一步,我使用Canny边缘检测器找到所有边缘
edges = cv2.Canny(thresh, 40, 120, apertureSize=5)
Run Code Online (Sandbox Code Playgroud)
注意:现在边缘是白色的,休息是黑色的,所以形态学操作的工作方式与天真的一样.(再次,一旦你理解了这个主题,它就很直观了)
现在我做一个形态开口,在ordeer中消除水平边缘(现在通常是单像素线),同时保留垂直边缘.
edges = cv2.morphologyEx(edges, cv2.MORPH_OPEN, np.ones((5,1),np.uint8))
Run Code Online (Sandbox Code Playgroud)
我通过使用扩张来强调垂直边缘来遵循这一点
edges = cv2.morphologyEx(edges, cv2.MORPH_DILATE, np.ones((1,3),np.uint8))
Run Code Online (Sandbox Code Playgroud)
在检查了预处理的图像后,我注意到存在复选框的地方,有许多列包含大部分白色像素,而在其他地方则不是这样.
我使用了一种称为"垂直投影"的技术,通过采用每列的平均强度将二维图像缩小到1维.
projection = np.mean(edges, 0).flatten()
Run Code Online (Sandbox Code Playgroud)
使用大致与每个潜在复选框位置一样宽的平均滤波器对其进行平滑处理
projection = cv2.blur(projection, (1, 21)).flatten()
Run Code Online (Sandbox Code Playgroud)
然后在半跨时再次平滑它
projection = cv2.blur(projection, (1, 11)).flatten()
Run Code Online (Sandbox Code Playgroud)
最后的projection曲线现在有明显的峰值,其中有复选框.
下图显示了此处理的结果(黄色=原始,红色= pass1,蓝色= pass2).
下一步是找到该曲线中的峰值 - scipy.signal.find_peaks结果是给出了期望的结果.
peaks = find_peaks(projection)[0]
Run Code Online (Sandbox Code Playgroud)
由于在一个框的区域内可能会出现多个峰值,因此我决定为每个峰值存储相关值(以便以后识别)
peak_values = projection[peaks]
Run Code Online (Sandbox Code Playgroud)
现在,我可以生成一个漂亮的图形来显示复选框的可能位置,以及检测到的峰值以及预期在两个场景中复选框的范围.
在此图中:
在这一点上,我知道了复选框可能位于的位置(峰值位置),以及指示这种情况的可能性(峰值处).这足以决定哪种情况更适合.
第一步是"收拾高峰".对于每个场景,都有一组范围,每个范围指定最小和最大X坐标.我使用以下函数来收集每个潜在复选框位置的峰值:
def bin_peaks(peaks, values, zones):
bins = [[] for x in xrange(len(zones))]
for peak, value in zip(peaks, values):
for i, zone in enumerate(zones):
if (peak >= zone[0]) and (peak <= zone[1]):
bins[i].append((peak, value))
return bins
Run Code Online (Sandbox Code Playgroud)
此时,对于每个可能的复选框位置,我有0个或更多与之对应的峰值列表.
为了能够确定两个场景中哪一个更合适,我需要将事物简化为表示匹配质量的单个浮点值.简单规则 - 具有更高质量指标的方案获胜.
作为一个起点,我选择使用每个位置的权重总和,标准化为位置数.
对于每个职位,有3种可能性:
在代码中:
def analyze_bins(bins):
total_weight = 0.0
for i, bin in enumerate(bins):
weight = 0.0
if len(bin) > 0:
best_bin = sorted(bin, key=lambda x: x[1], reverse=True)[0]
weight = best_bin[1]
total_weight += weight
total_weight /= len(bins)
return total_weight
Run Code Online (Sandbox Code Playgroud)
针对每种方案调试此算法的输出:
在这一点上,我对每个场景都有一个单一的指标,决策很简单 - 较高的一个是赢家.
weight_a = analyze_bins(bins_a)
weight_b = analyze_bins(bins_b)
checkbox_count = 5 if (weight_a > weight_b) else 6
Run Code Online (Sandbox Code Playgroud)
并且图像汇总了所有样本输入的结果:
生成所有报告的完整脚本:
import cv2
import numpy as np
import glob
import math
import StringIO
from scipy.signal import find_peaks
# ============================================================================
# Define the zones (x axis ranges) where checkboxes may occur
zones_a = [(50, 72), (144, 166), (243, 265), (328, 350), (436, 458)] # 5 box scenario
zones_b = [(42, 64), (122, 144), (207, 229), (276, 298), (369, 391), (496, 518)] # 6 box scanario
# ============================================================================
# Bonus -- plot a detailed analysis report as a PNG image
def plot_report(filename, report):
from matplotlib import pyplot as plt
from matplotlib.gridspec import GridSpec
IMAGE_KEYS = ['img', 'thresh', 'thresh_1', 'thresh_2', 'canny', 'canny_1', 'canny_2']
PLOT_SPAN = 5
TEXT_SPAN = 2
ROW_COUNT = (len(IMAGE_KEYS) + 1) + 3 * (PLOT_SPAN + 1) + 2 * (TEXT_SPAN)
fig = plt.figure()
plt.suptitle(filename)
gs = GridSpec(ROW_COUNT, 2)
row = 0
for key in IMAGE_KEYS:
plt.subplot(gs[row,:])
plt.gca().set_title(key)
plt.imshow(report[key], cmap='gray', aspect='equal')
plt.axis('off')
row += 1
proj_width = len(report['projection'])
proj_x = np.arange(proj_width)
plt.subplot(gs[row+1:row+1+PLOT_SPAN,:])
plt.gca().set_title('Vertical Projections (Raw and Smoothed)')
plt.plot(proj_x, report['projection'], 'y-')
plt.plot(proj_x, report['projection_1'], 'r-')
plt.plot(proj_x, report['projection_2'], 'b-')
plt.xlim((0, proj_width - 1))
plt.ylim((0, 255))
row += PLOT_SPAN + 1
plt.subplot(gs[row+1:row+1+PLOT_SPAN,:])
plt.gca().set_title('Smoothed Projection with Peaks and Zones')
plt.plot(proj_x, report['projection_2'])
for zone in zones_a:
plt.axvspan(zone[0], zone[1], facecolor='y', alpha=0.1)
for zone in zones_b:
plt.axvspan(zone[0], zone[1], facecolor='r', alpha=0.1)
for x in report['peaks']:
plt.axvline(x=x, color='m')
plt.xlim((0, proj_width - 1))
plt.ylim((0, report['projection_2'].max()))
row += PLOT_SPAN + 1
plt.subplot(gs[row+1:row+1+TEXT_SPAN,0], frameon=False)
plt.gca().set_title('Details - 5 boxes')
plt.axis([0, 1, 0, 1])
plt.gca().axes.get_yaxis().set_visible(False)
plt.gca().axes.get_xaxis().set_visible(False)
plt.text(0, 1, report['details_a'], family='monospace', fontsize=8, ha='left', va='top')
plt.subplot(gs[row+1:row+1+TEXT_SPAN,1], frameon=False)
plt.gca().set_title('Details - 6 boxes')
plt.axis([0, 1, 0, 1])
plt.gca().axes.get_yaxis().set_visible(False)
plt.gca().axes.get_xaxis().set_visible(False)
plt.text(0, 1, report['details_b'], family='monospace', fontsize=8, ha='left', va='top')
row += TEXT_SPAN
plt.subplot(gs[row+1:row+1+PLOT_SPAN,:])
plt.gca().set_title('Weights')
plt.barh([2, 1]
, [report['weight_a'], report['weight_b']]
, align='center'
, color=['y', 'r']
, tick_label=['5 boxes', '6 boxes'])
plt.ylim((0.5, 2.5))
row += PLOT_SPAN + 1
row += 1
plt.subplot(gs[row,:])
plt.gca().set_title('Input Image')
plt.imshow(report['img'], cmap='gray', aspect='equal')
plt.axis('off')
row += 1
plt.subplot(gs[row:row+TEXT_SPAN,:], frameon=False)
plt.axis([0, 1, 0, 1])
plt.gca().axes.get_yaxis().set_visible(False)
plt.gca().axes.get_xaxis().set_visible(False)
result_text = "The image contains %d boxes." % report['checkbox_count']
plt.text(0.5, 1, result_text, family='monospace', weight='semibold', fontsize=24, ha='center', va='top')
fig.set_size_inches(12, ROW_COUNT * 0.8)
plt.savefig('plot_%s.png' % filename[:2], bbox_inches="tight")
plt.close(fig)
# ----------------------------------------------------------------------------
# Bonus - create summary image showing inputs along with coloured result annotations.
def summary_report(result):
ROW_HEIGHT = result[0][0].shape[0]
images = [i[0] for i in result]
stacked = np.vstack(images)
extended = cv2.copyMakeBorder(stacked, 0, 0, 80, 0, cv2.BORDER_CONSTANT)
result = cv2.cvtColor(extended, cv2.COLOR_GRAY2BGR)
for i, entry in enumerate(result):
cv2.putText(result, '%d boxes' % entry[0]
, (4, ROW_HEIGHT * (i+1) - 4)
, cv2.FONT_HERSHEY_SIMPLEX
, 0.5
, [(0, 255, 255), (0, 0, 255)][entry[0] - 5]
, 1)
return result
# ============================================================================
# Collect peaks that fall into each potential checkbox location
def bin_peaks(peaks, values, zones):
bins = [[] for x in xrange(len(zones))]
for peak, value in zip(peaks, values):
for i, zone in enumerate(zones):
if (peak >= zone[0]) and (peak <= zone[1]):
bins[i].append((peak, value))
return bins
# ----------------------------------------------------------------------------
# Select best peaks for each bin, weigh them and return total weight + details text
def analyze_bins(bins):
buf = StringIO.StringIO()
total_weight = 0.0
for i, bin in enumerate(bins):
buf.write("Position %d: " % i)
weight = 0.0
if len(bin) == 0:
buf.write("no peaks")
else:
best_bin = sorted(bin, key=lambda x: x[1], reverse=True)[0]
weight = best_bin[1]
if len(bin) == 1:
buf.write("single peak @ %d (value=%0.3f)" % best_bin)
else:
buf.write("%d peaks, best @ %d (value=%0.3f)" % (len(bin), best_bin[0], best_bin[1]))
buf.write(" | weight=%0.3f\n" % weight)
total_weight += weight
total_weight /= len(bins)
buf.write("Total weight = %0.3f" % total_weight)
return total_weight, buf.getvalue()
# ----------------------------------------------------------------------------
# Process an input image, return checkbox count along with detailed debugging info in a dict
def process_image(filename):
report = {}
img = cv2.imread(filename, cv2.IMREAD_GRAYSCALE)
report['img'] = img.copy()
thresh = cv2.adaptiveThreshold(img, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 15, 2)
report['thresh'] = thresh.copy()
thresh = cv2.morphologyEx(thresh, cv2.MORPH_ERODE, np.ones((1,3),np.uint8))
report['thresh_1'] = thresh.copy()
thresh = cv2.morphologyEx(thresh, cv2.MORPH_DILATE, np.ones((3,1),np.uint8))
report['thresh_2'] = thresh.copy()
edges = cv2.Canny(thresh, 40, 120, apertureSize=5)
report['canny'] = edges.copy()
edges = cv2.morphologyEx(edges, cv2.MORPH_OPEN, np.ones((5,1),np.uint8))
report['canny_1'] = edges.copy()
edges = cv2.morphologyEx(edges, cv2.MORPH_DILATE, np.ones((1,3),np.uint8))
report['canny_2'] = edges.copy()
projection = np.mean(edges, 0).flatten()
report['projection'] = projection.copy()
projection = cv2.blur(projection, (1, 21)).flatten()
report['projection_1'] = projection.copy()
projection = cv2.blur(projection, (1, 11)).flatten()
report['projection_2'] = projection.copy()
peaks = find_peaks(projection)[0]
report['peaks'] = peaks.copy()
peak_values = projection[peaks]
report['peak_values'] = peak_values.copy()
bins_a = bin_peaks(peaks, peak_values, zones_a)
report['bins_a'] = list(bins_a)
bins_b = bin_peaks(peaks, peak_values, zones_b)
report['bins_b'] = list(bins_b)
weight_a, details_a = analyze_bins(bins_a)
report['weight_a'] = weight_a
report['details_a'] = details_a
weight_b, details_b = analyze_bins(bins_b)
report['weight_b'] = weight_b
report['details_b'] = details_b
checkbox_count = 5 if (weight_a > weight_b) else 6
report['checkbox_count'] = checkbox_count
return checkbox_count, report
# ============================================================================
result = []
for filename in glob.glob('*-*.png'):
box_count, report = process_image(filename)
plot_report(filename, report)
result.append((report['img'], report['checkbox_count']))
cv2.imwrite('summary.png', summary_report(result))
Run Code Online (Sandbox Code Playgroud)
随意纠正任何错别字,让我知道任何需要澄清的事情.
| 归档时间: |
|
| 查看次数: |
797 次 |
| 最近记录: |