Wiz*_*ard 5 python opencv image-processing
上下文:我正在执行对象本地化并希望实现抑制返回机制(即在动作之后红色边界框所在的图像上绘制黑色十字trigger.)
问题:我不知道如何准确地缩放与原始输入()相关的边界框(红色init_input).如果理解了这种缩放,则应将黑色十字准确地放置在红色边界框的中间.
我目前的此功能代码如下:
def IoR(b, init_input, prev_coord):
"""
Inhibition-of-Return mechanism.
Marks the region of the image covered by
the bounding box with a black cross.
:param b:
The current bounding box represented as [x1, y1, x2, y2].
:param init_input:
The initial input volume of the current episode.
:param prev_coord:
The previous state's bounding box coordinates (x1, y1, x2, y2)
"""
x1, y1, x2, y2 = prev_coord
width = 12
x_mid = (b[2] + b[0]) // 2
y_mid = (b[3] + b[1]) // 2
# Define vertical rectangle coordinates
ver_x1 = int(((x_mid) * IMG_SIZE / (x2 - x1)) - width)
ver_x2 = int(((x_mid) * IMG_SIZE / (x2 - x1)) + width)
ver_y1 = int((b[1]) * IMG_SIZE / (y2 - y1))
ver_y2 = int((b[3]) * IMG_SIZE / (y2 - y1))
# Define horizontal rectangle coordinates
hor_x1 = int((b[0]) * IMG_SIZE / (x2 - x1))
hor_x2 = int((b[2]) * IMG_SIZE / (x2 - x1))
hor_y1 = int(((y_mid) * IMG_SIZE / (y2 - y1)) - width)
hor_y2 = int(((y_mid) * IMG_SIZE / (y2 - y1)) + width)
# Draw vertical rectangle
cv2.rectangle(init_input, (ver_x1, ver_y1), (ver_x2, ver_y2), (0, 0, 0), -1)
# Draw horizontal rectangle
cv2.rectangle(init_input, (hor_x1, hor_y1), (hor_x2, hor_y2), (0, 0, 0), -1)
Run Code Online (Sandbox Code Playgroud)
期望的效果如下:
注意:我相信这个问题的复杂性是由于每次我采取行动(然后进入下一个状态)时图像被调整大小(到224,224,3).因此,必须从先前的状态缩放中提取用于确定缩放的"锚点",如以下代码所示:
def next_state(init_input, b_prime, g):
"""
Returns the observable region of the next state.
Formats the next state's observable region, defined
by b_prime, to be of dimension (224, 224, 3). Adding 16
additional pixels of context around the original bounding box.
The ground truth box must be reformatted according to the
new observable region.
IMG_SIZE = 224
:param init_input:
The initial input volume of the current episode.
:param b_prime:
The subsequent state's bounding box.
:param g: (init_g)
The initial ground truth box of the target object.
"""
# Determine the pixel coordinates of the observable region for the following state
context_pixels = 16
x1 = max(b_prime[0] - context_pixels, 0)
y1 = max(b_prime[1] - context_pixels, 0)
x2 = min(b_prime[2] + context_pixels, IMG_SIZE)
y2 = min(b_prime[3] + context_pixels, IMG_SIZE)
# Determine observable region
observable_region = cv2.resize(init_input[y1:y2, x1:x2], (224, 224), interpolation=cv2.INTER_AREA)
# Resize ground truth box
g[0] = int((g[0] - x1) * IMG_SIZE / (x2 - x1)) # x1
g[1] = int((g[1] - y1) * IMG_SIZE / (y2 - y1)) # y1
g[2] = int((g[2] - x1) * IMG_SIZE / (x2 - x1)) # x2
g[3] = int((g[3] - y1) * IMG_SIZE / (y2 - y1)) # y2
return observable_region, g, (b_prime[0], b_prime[1], b_prime[2], b_prime[3])
Run Code Online (Sandbox Code Playgroud)
存在t代理正在预测目标对象的位置的状态.目标对象有一个地面实况框(图中黄色,草图点缀),代理人当前的"本地化框"是红色边界框.说,在州t,代理商决定最好向右移动.因此,边界框向右移动,然后t'通过在红色边界框周围添加额外的16个像素的上下文,相对于此边界裁剪原始图像,然后将裁剪后的图像放大,确定下一个状态到224,224的尺寸.
假设代理现在确信其预测是准确的,因此它选择了该trigger动作.这基本上意味着,结束当前目标对象的本地化事件,并在代理预测对象所在的位置(即红色边界框的中间)放置黑色十字.现在,由于当前状态在先前动作之后被裁剪之后被放大,因此必须相对于正常/原始/初始图像重新缩放边界框,然后可以将黑色十字准确地绘制到图像上.
在我的问题的背景下,状态之间的第一次重新缩放工作非常好(本文中的第二个代码).然而,缩小到正常并绘制黑色十字架是我似乎无法理解的问题.
这是一张希望有助于解释的图片:
以下是我当前解决方案的输出(请点击图片放大):
我认为最好全局保存坐标,而不是使用一堆高档/低档坐标。它们让我头疼,并且可能会因舍入而损失精度。
也就是说,每次检测到某些东西时,首先将其转换为全局(原始图像)坐标。我在这里写了一个小演示,模仿你的检测和trigger行为。
代码:
import cv2
import matplotlib.pyplot as plt
IMG_SIZE = 224
im = cv2.cvtColor(cv2.imread('lena.jpg'), cv2.COLOR_BGR2GRAY)
im = cv2.resize(im, (IMG_SIZE, IMG_SIZE))
# Your detector results
detected_region = [
[(10, 20) , (80, 100)],
[(50, 0) , (220, 190)],
[(100, 143) , (180, 200)],
[(110, 45) , (180, 150)]
]
# Global states
x_scale = 1.0
y_scale = 1.0
x_shift = 0
y_shift = 0
x1, y1 = 0, 0
x2, y2 = IMG_SIZE-1, IMG_SIZE-1
for region in detected_region:
# Detection
x_scale = IMG_SIZE / (x2-x1)
y_scale = IMG_SIZE / (y2-y1)
x_shift = x1
y_shift = y1
cur_im = cv2.resize(im[y1:y2, x1:x2], (IMG_SIZE, IMG_SIZE))
# Assuming the detector return these results
cv2.rectangle(cur_im, region[0], region[1], (255))
plt.imshow(cur_im)
plt.show()
# Zooming in, using part of your code
context_pixels = 16
x1 = max(region[0][0] - context_pixels, 0) / x_scale + x_shift
y1 = max(region[0][1] - context_pixels, 0) / y_scale + y_shift
x2 = min(region[1][0] + context_pixels, IMG_SIZE) / x_scale + x_shift
y2 = min(region[1][1] + context_pixels, IMG_SIZE) / y_scale + y_shift
x1, y1, x2, y2 = int(x1), int(y1), int(x2), int(y2)
# Assuming the detector confirm its choice here
print('Confirmed detection: ', x1, y1, x2, y2)
# This time no padding
x1 = detected_region[-1][0][0] / x_scale + x_shift
y1 = detected_region[-1][0][1] / y_scale + y_shift
x2 = detected_region[-1][1][0] / x_scale + x_shift
y2 = detected_region[-1][1][1] / y_scale + y_shift
x1, y1, x2, y2 = int(x1), int(y1), int(x2), int(y2)
cv2.rectangle(im, (x1, y1), (x2, y2), (255, 0, 0))
plt.imshow(im)
plt.show()
Run Code Online (Sandbox Code Playgroud)
这还可以防止在调整大小的图像上调整大小,这可能会产生更多伪影并恶化检测器的性能。
| 归档时间: |
|
| 查看次数: |
1783 次 |
| 最近记录: |