将is_training设置为false时,MobileNet不可用

Oha*_*eir 6 tensorflow

对此问题的更准确描述是,当is_training未明确设置为true时,MobileNet会表现不佳.我指的是TensorFlow在其模型库https://github.com/tensorflow/models/blob/master/slim/nets/mobilenet_v1.py中提供的MobileNet .

这就是我创建网络的方式(phase_train = True):

with slim.arg_scope(mobilenet_v1.mobilenet_v1_arg_scope(is_training=phase_train)):
        features, endpoints = mobilenet_v1.mobilenet_v1(
            inputs=images_placeholder, features_layer_size=features_layer_size, dropout_keep_prob=dropout_keep_prob,
            is_training=phase_train)
Run Code Online (Sandbox Code Playgroud)

我正在训练一个识别网络,在训练时我在LFW上进行测试.我在培训期间获得的结果随着时间的推移而得到改善并获得了良好的准确性.

在部署之前,我冻结了图表.如果我使用is_training = True冻结图形,我在LFW上获得的结果与训练期间相同.但是,如果我设置is_training = False,我得到的结果就像网络根本没有训练过......

这种行为实际上发生在像Inception这样的其他网络上.

我倾向于认为我错过了一些非常基本的东西,这不是TensorFlow中的错误...

任何帮助,将不胜感激.

添加更多代码......

这就是我准备培训的方式:

images_placeholder = tf.placeholder(tf.float32, shape=(None, image_size, image_size, 1), name='input')
labels_placeholder = tf.placeholder(tf.int32, shape=(None))
dropout_placeholder = tf.placeholder_with_default(1.0, shape=(), name='dropout_keep_prob')
phase_train_placeholder = tf.Variable(True, name='phase_train')

global_step = tf.Variable(0, name='global_step', trainable=False)

# build graph

with slim.arg_scope(mobilenet_v1.mobilenet_v1_arg_scope(is_training=phase_train_placeholder)):
    features, endpoints = mobilenet_v1.mobilenet_v1(
        inputs=images_placeholder, features_layer_size=512, dropout_keep_prob=1.0,
        is_training=phase_train_placeholder)

# loss

logits = slim.fully_connected(inputs=features, num_outputs=train_data.get_class_count(), activation_fn=None,
                              weights_initializer=tf.truncated_normal_initializer(stddev=0.1),
                              weights_regularizer=slim.l2_regularizer(scale=0.00005),
                              scope='Logits', reuse=False)

tf.losses.sparse_softmax_cross_entropy(labels=labels_placeholder, logits=logits,
                                       reduction=tf.losses.Reduction.MEAN)

loss = tf.losses.get_total_loss()

# normalize output for inference

embeddings = tf.nn.l2_normalize(features, 1, 1e-10, name='embeddings')

# optimizer

optimizer = tf.train.AdamOptimizer()
train_op = optimizer.minimize(loss, global_step=global_step)
Run Code Online (Sandbox Code Playgroud)

这是我的火车步骤:

batch_data, batch_labels = train_data.next_batch()
feed_dict = {
    images_placeholder: batch_data,
    labels_placeholder: batch_labels,
    dropout_placeholder: dropout_keep_prob
}
_, loss_value = sess.run([train_op, loss], feed_dict=feed_dict)
Run Code Online (Sandbox Code Playgroud)

我可以添加代码来解冻图表,但这并不是必需的.使用is_train = false构建图形,加载最新检查点并在LWF上运行评估以重现问题就足​​够了.

更新中...

我发现问题出在批量规范化层.将此图层设置为is_training = false足以重现问题.

找到后我发现的参考文献:

http://ruishu.io/2016/12/27/batchnorm/

https://github.com/tensorflow/tensorflow/issues/10118

批量标准化 - Tensorflow

一旦我有一个测试的解决方案,将使用解决方案更新.

Oha*_*eir 5

所以我找到了解决方案。主要使用此参考:http : //ruishu.io/2016/12/27/batchnorm/

从链接:

注意:当is_training为True时,需要更新moving_mean和moving_variance,默认情况下,update_ops放置在tf.GraphKeys.UPDATE_OPS中,因此需要将它们作为依赖项添加到train_op中,例如:

如果update_ops,则update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS):更新= tf.group(* update_ops)total_loss = control_flow_ops.with_dependencies([updates],total_loss)

而且要直截了当,而不是像这样创建优化器:

optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
train_op = optimizer.minimize(total_loss, global_step=global_step)
Run Code Online (Sandbox Code Playgroud)

像这样做:

update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
    optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
    train_op = optimizer.minimize(total_loss, global_step=global_step)
Run Code Online (Sandbox Code Playgroud)

那将解决问题。