Ish*_*ida 0 image-processing computer-vision deep-learning conv-neural-network tensorflow
我想从 Tensorflow 对象检测 API 训练一个 ssd-inception-v2 模型。我想使用的训练数据集是一堆不同大小的裁剪图像,没有边界框,因为裁剪本身就是边界框。
我按照 create_pascal_tf_record.py 示例相应地替换了边界框和分类部分以生成 TFRecords,如下所示:
def dict_to_tf_example(imagepath, label):
image = Image.open(imagepath)
if image.format != 'JPEG':
print("Skipping file: " + imagepath)
return
img = np.array(image)
with tf.gfile.GFile(imagepath, 'rb') as fid:
encoded_jpg = fid.read()
# The reason to store image sizes was demonstrated
# in the previous example -- we have to know sizes
# of images to later read raw serialized string,
# convert to 1d array and convert to respective
# shape that image used to have.
height = img.shape[0]
width = img.shape[1]
key = hashlib.sha256(encoded_jpg).hexdigest()
# Put in the original images into array
# Just for future check for correctness
xmin = [5.0/100.0]
ymin = [5.0/100.0]
xmax = [95.0/100.0]
ymax = [95.0/100.0]
class_text = [label['name'].encode('utf8')]
classes = [label['id']]
example = tf.train.Example(features=tf.train.Features(feature={
'image/height':dataset_util.int64_feature(height),
'image/width': dataset_util.int64_feature(width),
'image/filename': dataset_util.bytes_feature(imagepath.encode('utf8')),
'image/source_id': dataset_util.bytes_feature(imagepath.encode('utf8')),
'image/encoded': dataset_util.bytes_feature(encoded_jpg),
'image/key/sha256': dataset_util.bytes_feature(key.encode('utf8')),
'image/format': dataset_util.bytes_feature('jpeg'.encode('utf8')),
'image/object/class/text': dataset_util.bytes_list_feature(class_text),
'image/object/class/label': dataset_util.int64_list_feature(classes),
'image/object/bbox/xmin': dataset_util.float_list_feature(xmin),
'image/object/bbox/xmax': dataset_util.float_list_feature(xmax),
'image/object/bbox/ymin': dataset_util.float_list_feature(ymin),
'image/object/bbox/ymax': dataset_util.float_list_feature(ymax)
}))
return example
def main(_):
data_dir = FLAGS.data_dir
output_path = os.path.join(data_dir,FLAGS.output_path + '.record')
writer = tf.python_io.TFRecordWriter(output_path)
label_map = label_map_util.load_labelmap(FLAGS.label_map_path)
categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=80, use_display_name=True)
category_index = label_map_util.create_category_index(categories)
category_list = os.listdir(data_dir)
gen = (category for category in categories if category['name'] in category_list)
for category in gen:
examples_path = os.path.join(data_dir,category['name'])
examples_list = os.listdir(examples_path)
for example in examples_list:
imagepath = os.path.join(examples_path,example)
tf_example = dict_to_tf_example(imagepath,category)
writer.write(tf_example.SerializeToString())
# print(tf_example)
writer.close()
Run Code Online (Sandbox Code Playgroud)
边界框是硬编码的,包含整个图像。标签是根据其相应的目录给出的。我使用 mscoco_label_map.pbxt 进行标记,使用 ssd_inception_v2_pets.config 作为管道的基础。
我训练并冻结了模型以与 jupyter notebook 示例一起使用。然而,最终的结果是围绕整个图像的单个框。知道出了什么问题吗?
小智 5
对象检测算法/网络通常通过预测边界框的位置以及类别来工作。出于这个原因,训练数据通常需要包含边界框数据。通过为您的模型提供带有始终与图像大小相同的边界框的训练数据,您可能会得到垃圾预测,包括始终勾勒出图像轮廓的框。
这听起来像是您的训练数据有问题。您不应该提供裁剪的图像,而是提供带有注释对象的完整图像/场景。此时您基本上是在训练分类器。
尝试使用未裁剪的正确图像样式进行训练,然后看看效果如何。
归档时间: |
|
查看次数: |
1626 次 |
最近记录: |