警告:tensorflow:尽管配置参数为true,但忽略带有图像ID的检测

Dor*_*mez 6 python object-detection tensorflow tensorboard nvidia-docker

我目前正在尝试使用GTSDB数据集训练更快的RCNN Inception V2模型(使用COCO预训练).我有FullIJCNN数据集,我将数据集分成三部分作为训练,验证测试.最后,我分别创建了3个不同的csv文件,然后创建了用于训练验证的 TFRecord文件.另一方面,我有一个代码块,它读取每个图像的地面实况框坐标,并在图像上的交通标志周围绘制框.它也正确地写了类标签.这是一些例子.同样,这些框不是由网络预测的.它们由函数手动绘制.

绘制的盒子1

绘制框2

然后我使用数据集文件夹中包含的README文件创建了一个标签文件,并在labels.txt的第一行添加了0背景行,使其与我的代码一起工作(我认为这是一些愚蠢的事情),因为它是抛出索引错误.但是没有钥匙在我.pbtxt文件,以使"背景",它从1开始.最后我已经配置好了faster_rcnn_inception_v2_coco.config文件,更改num_classes: 90num_classes: 43因为数据集有43个班,num_examples: 5000num_examples: 186因为我已经划分了数据集有186个测试例子.num_steps: 200000按原样使用.最后我通过跑步开始了训练工作

python object_detection/model_main.py \
    --pipeline_config_path=${PIPELINE_CONFIG_PATH} \
    --model_dir=${MODEL_DIR} \
    --num_train_steps=50000 \
    --num_eval_steps=2000 \
    --alsologtostderr
Run Code Online (Sandbox Code Playgroud)

命令,这是回溯(对不起代码块,我不知道如何专门添加日志):

import matplotlib; matplotlib.use('Agg')  # pylint: disable=multiple-statements
WARNING:tensorflow:Estimator's model_fn (<function model_fn at 0x7fc4cd6a4938>) includes params argument, but params are not passed to Estimator.
WARNING:tensorflow:num_readers has been reduced to 1 to match input file shards.
WARNING:tensorflow:From /home/models/research/object_detection/core/box_predictor.py:407: calling reduce_mean (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
WARNING:tensorflow:From /home/models/research/object_detection/meta_architectures/faster_rcnn_meta_arch.py:2037: get_or_create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.get_or_create_global_step
WARNING:tensorflow:From /home/models/research/object_detection/core/losses.py:317: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version.
Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See @{tf.nn.softmax_cross_entropy_with_logits_v2}.

/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_impl.py:100: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
2018-07-26 09:48:21.785041: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-07-26 09:48:21.923329: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties:
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 9b2f:00:00.0
totalMemory: 11.17GiB freeMemory: 11.10GiB
2018-07-26 09:48:21.923382: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-07-26 09:48:22.153991: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-07-26 09:48:22.154053: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929]      0
2018-07-26 09:48:22.154075: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0:   N
2018-07-26 09:48:22.154333: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10763 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 9b2f:00:00.0, compute capability: 3.7)
2018-07-26 09:58:31.794649: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-07-26 09:58:31.794723: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-07-26 09:58:31.794747: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929]      0
2018-07-26 09:58:31.794765: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0:   N
2018-07-26 09:58:31.794884: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10763 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 9b2f:00:00.0, compute capability: 3.7)
WARNING:tensorflow:Ignoring ground truth with image id 2066941970 since it was previously added
WARNING:tensorflow:Ignoring detection with image id 2066941970 since it was previously added
WARNING:tensorflow:Ignoring ground truth with image id 2013299735 since it was previously added
WARNING:tensorflow:Ignoring detection with image id 2013299735 since it was previously added
WARNING:tensorflow:Ignoring ground truth with image id 1416415107 since it was previously added
Run Code Online (Sandbox Code Playgroud)

它创建了许多这样的警告:

WARNING:tensorflow:Ignoring ground truth with image id 2013299735 since it was previously added
WARNING:tensorflow:Ignoring detection with image id 2013299735 since it was previously added
Run Code Online (Sandbox Code Playgroud)

这些消息的原因num_examples已设置为2000尽管我的原始配置文件具有该行num_examples: 186.我不明白它为什么用不同的参数创建一个新的配置文件.然而,在整个日志充满了这些消息之后,它会给出一个报告,但我无法确定这究竟是什么意思告诉我.这是报告:

creating index...
index created!
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=0.07s).
Accumulating evaluation results...
DONE (t=0.02s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
Run Code Online (Sandbox Code Playgroud)

最后我检查了Tensorboard以确保它正确训练,但我看到的是令人沮丧的.这是我的模型(丢失)的Tensorboard图表的屏幕截图:

失利

一般损失

我觉得我做错了什么.我不知道这是否是一个具体问题,但我尽可能详细地提供详细信息.

我的问题是:我应该在这些步骤中做出哪些改变?为什么我的功能会绘制真正的盒子,但我的模型无法弄清楚发生了什么?提前致谢!

小智 7

您收到警告的原因是因为您的数据集中的项目正在被多次评估.您为num_train_steps和num_eval_steps指定的值应与train_config batch_size和数据集的大小相关联.例如,如果您的批量大小为24且您有24000条培训记录,则num_train_steps应设置为1000,同样与num_eval_steps的计算方法相同,但具有评估记录的数量.如果您使用指定的值执行脚本,则model_main.py脚本似乎不会利用您在pipeline.config文件中指定的值.


小智 5

我遇到了同样的问题,过了一段时间,我想出了这个对我有用的解决方案,但一定不是全局解决方案;如果您使用的数据集分布在多个文件夹中,并且您使用自己制作的 tf_record 转换器,则可能会出现整个数据集中每个帧命名冲突的问题。

由于我使用完整路径作为文件名(避免冲突),我再也没有看到警告。我希望它能帮助别人。

tf_example = tf.train.Example(features=tf.train.Features(feature={
    'image/height': dataset_util.int64_feature(im_height),
    'image/width': dataset_util.int64_feature(im_width),
    'image/filename': dataset_util.bytes_feature(filename),
    'image/source_id': dataset_util.bytes_feature(filename),
    'image/encoded': dataset_util.bytes_feature(encoded_image_data),
    'image/format': dataset_util.bytes_feature(image_format),
    'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),
    'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),
    'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),
    'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),
    'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
    'image/object/class/label': dataset_util.int64_list_feature(classes),
}))
Run Code Online (Sandbox Code Playgroud)