沿一个轴组合附近的边界框

Question

沿一个轴组合附近的边界框

tit*_*ata 6 python opencv python-3.x google-vision

在这里，我使用 Google Vision API 从下图中检测文本。红色框表示我想获得的组合边界框的样本。

基本上，我从上图中得到了文本输出和边界框。在这里，我想合并位于同一行（从左到右）的边界框和文本。例如，第一行将合并在一起：

[{'description': '?????????????????',
  'vertices': [(528, 202), (741, 202), (741, 222), (528, 222)]},
 {'description': '??????',
 'vertices': [(754, 204), (809, 204), (809, 222), (754, 222)]},
 ...

Run Code Online (Sandbox Code Playgroud)

到

[{'description': '??????????????????????',
  'vertices': [(528, 202), (809, 202), (809, 222), (528, 222)]},
 ...

Run Code Online (Sandbox Code Playgroud)

以下这些行

 {'description': 'RP',
  'vertices': [(729, 1072), (758, 1072), (758, 1091), (729, 1091)]},
 {'description': '8147',
  'vertices': [(768, 1072), (822, 1072), (822, 1092), (768, 1092)]},
 {'description': '3609',
  'vertices': [(834, 1073), (889, 1073), (889, 1093), (834, 1093)]},
 {'description': '7',
  'vertices': [(900, 1073), (911, 1073), (911, 1092), (900, 1092)]},
 {'description': 'TH',

Run Code Online (Sandbox Code Playgroud)

将合并在一起。

当前方法

我研究了 -使用 OpenCV 的解决方案 -非最大抑制算法

但不能为我的需要生产一个特定的，因为它依赖于重叠像素的百分比。如果有人可以提供帮助，那就太好了！

请尝试在此处使用边界框数据：https : //gist.github.com/titipata/fd44572f7f6c3cc1dfbac05fb86f6081

Answer 1

Zab*_*azi 7

输入：

out = [{'description': '?????????????????',
  'vertices': [(528, 202), (741, 202), (741, 222), (528, 222)]},
 {'description': '??????',
 'vertices': [(754, 204), (809, 204), (809, 222), (754, 222)]},
 {'description': 'RP',
  'vertices': [(729, 1072), (758, 1072), (758, 1091), (729, 1091)]},
 {'description': '8147',
  'vertices': [(768, 1072), (822, 1072), (822, 1092), (768, 1092)]},
 {'description': '3609',
  'vertices': [(834, 1073), (889, 1073), (889, 1093), (834, 1093)]},
 {'description': '7',
  'vertices': [(900, 1073), (911, 1073), (911, 1092), (900, 1092)]}
]

Run Code Online (Sandbox Code Playgroud)

我假设，这 4 个元组分别代表左上角、右上角、右下角和左下角坐标的 x、y 坐标（按顺序）。
首先，我们需要找到所有在 x 方向上接近且在 y 方向上几乎相同（位置相同）的 bbox 对。注意：如果遗漏了某些内容，您可能需要调整这两个阈值。

import numpy as np

pairs = []

threshold_y = 4 # height threshold
threshold_x = 20 # x threshold

for i in range(len(out)):
    for j in range(i+1, len(out)):
        left_upi, right_upi, right_lowi, left_lowi = out[i]['vertices']
        left_upj, right_upj, right_lowj, left_lowj = out[j]['vertices']
        # first of all, they should be in the same height range, starting Y axis should be almost same
        # their starting x axis is close upto a threshold
        cond1 = (abs(left_upi[1] - left_upj[1]) < threshold_y)
        cond2 = (abs(right_upi[0] - left_upj[0]) < threshold_x)
        cond3 = (abs(right_upj[0] - left_upi[0]) < threshold_x)

        if cond1 and (cond2 or cond3):
            pairs.append([i,j])

Run Code Online (Sandbox Code Playgroud)

出去：

pairs
[[0, 1], [2, 3], [3, 4], [4, 5]]

Run Code Online (Sandbox Code Playgroud)

现在，我们只有对，但我们还需要找到所有连接的组件，例如，我们知道 0、1 在一个组件中，而 2、3、4、5 在另一个组件中。（通常，图算法最适合此任务，但为了简单起见，我进行了迭代搜索）

merged_pairs = []

for i in range(len(pairs)):
    cur_set = set()
    p = pairs[i]

    done = False
    for k in range(len(merged_pairs)):
        if p[0] in merged_pairs[k]:
            merged_pairs[k].append(p[1])
            done = True
        if p[1] in merged_pairs[k]:
            merged_pairs[k].append(p[0])
            done = True

    if done:
        continue

    cur_set.add(p[0])
    cur_set.add(p[1])

    match_idx = []
    while True:
        num_match = 0
        for j in range(i+1, len(pairs)):
            p2 = pairs[j]

            if j not in match_idx and (p2[0] in cur_set or p2[1] in cur_set):
                cur_set.add(p2[0])
                cur_set.add(p2[1])
                num_match += 1
                match_idx.append(j)

        if num_match == 0:
            break
    merged_pairs.append(list(cur_set))

merged_pairs = [list(set(a)) for a in merged_pairs]

Run Code Online (Sandbox Code Playgroud)

出去：

merged_pairs
[[0, 1], [2, 3, 4, 5]]

Run Code Online (Sandbox Code Playgroud)

替代 networkx 解决方案：

（如果你不介意使用额外的networkx导入，会更短）

import networkx as nx

g = nx.Graph()
g.add_edges_from([[0, 1], [2, 3], [3, 4], [4, 5]]) # pass pairs here

gs = [list(a) for a in list(nx.connected_components(g))] # get merged pairs here
print(gs)

Run Code Online (Sandbox Code Playgroud)

[[0, 1], [2, 3, 4, 5]]

现在，我们有了所有连接的组件，我们可以根据它们的起始 x 坐标对它们进行排序，并合并边界框。

# for connected components, sort them according to x-axis and merge

out_final = []

INF = 999999999 # a large number greater than any co-ordinate
for idxs in merged_pairs:
    c_bbox = []

    for i in idxs:
        c_bbox.append(out[i])

    sorted_x = sorted(c_bbox, key =  lambda x: x['vertices'][0][0])

    new_sol = {}
    new_sol['description'] = ''
    new_sol['vertices'] = [[INF, INF], [-INF, INF], [-INF, -INF], [INF, -INF]]
    for k in sorted_x:
        new_sol['description'] += k['description']

        new_sol['vertices'][0][0] = min(new_sol['vertices'][0][0], k['vertices'][0][0])
        new_sol['vertices'][0][1] = min(new_sol['vertices'][0][1], k['vertices'][0][1])

        new_sol['vertices'][1][0] = max(new_sol['vertices'][1][0], k['vertices'][1][0])
        new_sol['vertices'][1][1] = min(new_sol['vertices'][1][1], k['vertices'][1][1])


        new_sol['vertices'][2][0] = max(new_sol['vertices'][2][0], k['vertices'][2][0])
        new_sol['vertices'][2][1] = max(new_sol['vertices'][2][1], k['vertices'][2][1])        

        new_sol['vertices'][3][0] = min(new_sol['vertices'][3][0], k['vertices'][3][0])
        new_sol['vertices'][3][1] = max(new_sol['vertices'][3][1], k['vertices'][3][1])  


    out_final.append(new_sol)

Run Code Online (Sandbox Code Playgroud)

最终输出：

out_final
[{'description': '???????????????????????',
  'vertices': [[528, 202], [809, 202], [809, 222], [528, 222]]},
 {'description': 'RP814736097',
  'vertices': [[729, 1072], [911, 1072], [911, 1093], [729, 1093]]}]

Run Code Online (Sandbox Code Playgroud)

干得好！我会检查答案并在周末接受它！ (2认同)

归档时间：	5 年，9 月前
查看次数：	448 次
最近记录：	5 年，9 月前