如何在张量流中正确使用批量标准化?

wid*_*txp 18 deep-learning tensorflow

我在tensorflow中尝试了几个版本的batch_normalization,但它们都没有工作!当我在推理时设置batch_size = 1时,结果都是错误的.

版本1:直接使用tensorflow.contrib中的官方版本

from tensorflow.contrib.layers.python.layers.layers import batch_norm
Run Code Online (Sandbox Code Playgroud)

使用这样:

output = lrelu(batch_norm(tf.nn.bias_add(conv, biases), is_training), 0.5, name=scope.name)
Run Code Online (Sandbox Code Playgroud)

is_training =训练时为真,推理时为假.

版本2:如何在TensorFlow中使用批量标准化?

def batch_norm_layer(x, train_phase, scope_bn='bn'):
    bn_train = batch_norm(x, decay=0.999, epsilon=1e-3, center=True, scale=True,
            updates_collections=None,
            is_training=True,
            reuse=None, # is this right?
            trainable=True,
            scope=scope_bn)
    bn_inference = batch_norm(x, decay=0.999, epsilon=1e-3, center=True, scale=True,
            updates_collections=None,
            is_training=False,
            reuse=True, # is this right?
            trainable=True,
            scope=scope_bn)
    z = tf.cond(train_phase, lambda: bn_train, lambda: bn_inference)
    return z
Run Code Online (Sandbox Code Playgroud)

使用这样:

output = lrelu(batch_norm_layer(tf.nn.bias_add(conv, biases), is_training), 0.5, name=scope.name)
Run Code Online (Sandbox Code Playgroud)

is_training是培训时间的占位符,在推理时间是真假.

版本3:来自slim https://github.com/tensorflow/models/blob/master/inception/inception/slim/ops.py

def batch_norm_layer(inputs,
           is_training=True,
           scope='bn'):
  decay=0.999
  epsilon=0.001
  inputs_shape = inputs.get_shape()
  with tf.variable_scope(scope) as t_scope:
    axis = list(range(len(inputs_shape) - 1))
    params_shape = inputs_shape[-1:]
    # Allocate parameters for the beta and gamma of the normalization.
    beta, gamma = None, None
    beta = tf.Variable(tf.zeros_initializer(params_shape),
        name='beta',
        trainable=True)
    gamma = tf.Variable(tf.ones_initializer(params_shape),
        name='gamma',
        trainable=True)
    moving_mean = tf.Variable(tf.zeros_initializer(params_shape),
        name='moving_mean',
        trainable=False)
    moving_variance = tf.Variable(tf.ones_initializer(params_shape),
        name='moving_variance',
        trainable=False)
    if is_training:
      # Calculate the moments based on the individual batch.
      mean, variance = tf.nn.moments(inputs, axis)

      update_moving_mean = moving_averages.assign_moving_average(
          moving_mean, mean, decay)
      update_moving_variance = moving_averages.assign_moving_average(
          moving_variance, variance, decay)
    else:
      # Just use the moving_mean and moving_variance.
      mean = moving_mean
      variance = moving_variance
      # Normalize the activations.
    outputs = tf.nn.batch_normalization(
       inputs, mean, variance, beta, gamma, epsilon)
    outputs.set_shape(inputs.get_shape())
    return outputs
Run Code Online (Sandbox Code Playgroud)

使用这样:

output = lrelu(batch_norm_layer(tf.nn.bias_add(conv, biases), is_training), 0.5, name=scope.name)
Run Code Online (Sandbox Code Playgroud)

is_training =训练时为真,推理时为假.

版本4:与版本3一样,但添加tf.control_dependencies

def batch_norm_layer(inputs,
           decay=0.999,
           center=True,
           scale=True,
           epsilon=0.001,
           moving_vars='moving_vars',
           activation=None,
           is_training=True,
           trainable=True,
           restore=True,
           scope='bn',
           reuse=None):
  inputs_shape = inputs.get_shape()
  with tf.variable_op_scope([inputs], scope, 'BatchNorm', reuse=reuse):
      axis = list(range(len(inputs_shape) - 1))
      params_shape = inputs_shape[-1:]
      # Allocate parameters for the beta and gamma of the normalization.
      beta = tf.Variable(tf.zeros(params_shape), name='beta')
      gamma = tf.Variable(tf.ones(params_shape), name='gamma')
      # Create moving_mean and moving_variance add them to
      # GraphKeys.MOVING_AVERAGE_VARIABLES collections.
      moving_mean = tf.Variable(tf.zeros(params_shape), name='moving_mean',
            trainable=False)
      moving_variance = tf.Variable(tf.ones(params_shape),   name='moving_variance', 
            trainable=False)
  control_inputs = []
  if is_training:
      # Calculate the moments based on the individual batch.
      mean, variance = tf.nn.moments(inputs, axis)

      update_moving_mean = moving_averages.assign_moving_average(
          moving_mean, mean, decay)
      update_moving_variance = moving_averages.assign_moving_average(
          moving_variance, variance, decay)
      control_inputs = [update_moving_mean, update_moving_variance]
  else:
      # Just use the moving_mean and moving_variance.
      mean = moving_mean
      variance = moving_variance
  # Normalize the activations. 
  with tf.control_dependencies(control_inputs):
      return tf.nn.batch_normalization(
        inputs, mean, variance, beta, gamma, epsilon)
Run Code Online (Sandbox Code Playgroud)

使用这样:

output = lrelu(batch_norm(tf.nn.bias_add(conv, biases), is_training), 0.5, name=scope.name)
Run Code Online (Sandbox Code Playgroud)

is_training =训练时为真,推理时为假.

Batch_normalization的4个版本都不正确.那么,如何正确使用批量标准化?

另一个奇怪的现象是,如果我将batch_norm_layer设置为null,则推理结果全部相同.

def batch_norm_layer(inputs, is_training):
    return inputs
Run Code Online (Sandbox Code Playgroud)

Zho*_*ang 8

我已经测试过,tf.contrib.layers.batch_norm只要设置相同,批量标准化的以下简化实现就会得到相同的结果.

def initialize_batch_norm(scope, depth):
    with tf.variable_scope(scope) as bnscope:
         gamma = tf.get_variable("gamma", shape[-1], initializer=tf.constant_initializer(1.0))
         beta = tf.get_variable("beta", shape[-1], initializer=tf.constant_initializer(0.0))
         moving_avg = tf.get_variable("moving_avg", shape[-1], initializer=tf.constant_initializer(0.0), trainable=False)
         moving_var = tf.get_variable("moving_var", shape[-1], initializer=tf.constant_initializer(1.0), trainable=False)
         bnscope.reuse_variables()


def BatchNorm_layer(x, scope, train, epsilon=0.001, decay=.99):
    # Perform a batch normalization after a conv layer or a fc layer
    # gamma: a scale factor
    # beta: an offset
    # epsilon: the variance epsilon - a small float number to avoid dividing by 0
    with tf.variable_scope(scope, reuse=True):
        with tf.variable_scope('BatchNorm', reuse=True) as bnscope:
            gamma, beta = tf.get_variable("gamma"), tf.get_variable("beta")
            moving_avg, moving_var = tf.get_variable("moving_avg"), tf.get_variable("moving_var")
            shape = x.get_shape().as_list()
            control_inputs = []
            if train:
                avg, var = tf.nn.moments(x, range(len(shape)-1))
                update_moving_avg = moving_averages.assign_moving_average(moving_avg, avg, decay)
                update_moving_var = moving_averages.assign_moving_average(moving_var, var, decay)
                control_inputs = [update_moving_avg, update_moving_var]
            else:
                avg = moving_avg
                var = moving_var
            with tf.control_dependencies(control_inputs):
                output = tf.nn.batch_normalization(x, avg, var, offset=beta, scale=gamma, variance_epsilon=epsilon)
    return output
Run Code Online (Sandbox Code Playgroud)

使用批量标准化的官方实施的主要技巧tf.contrib.layers.batch_norm是:(1)设置is_training=True培训时间和is_training=False验证和测试时间; (2)设置updates_collections=None以确保moving_variancemoving_mean更新到位; (3)注意并注意范围设置; (4)如果您的数据集很小或者您的总培训更新/步骤不是那么大,则设置decay为小于(decay=0.9decay=0.99)默认值(默认值为0.999).

  • 我一直对`tf.contrib.layers.batch_norm`有疑问.我在训练时网络收敛但是当我测试网络并设置`is_training = False`时,它给了我无意义的结果.但是,当`is_training = True`时的测试结果对我来说更有意义(即使与没有batch_norm的网络相比,精度几乎为零).任何的想法?我在这里问:[http://stackoverflow.com/questions/42770757/tensorflow-batch-norm-does-not-work-properly-when-testing-is-training-false](Tensorflow batch_norm在测试时不能正常工作) (is_training = FALSE)) (3认同)