tf.layers.batch_normalization 参数

edn*_*edn 2 python machine-learning neural-network tensorflow

我不确定是否只有我认为 tensorflow 文档有点弱。

我打算使用 tf.nn.batch_normalization 函数来实现批量标准化,但后来发现 tf.layers.batch_normalization 函数似乎应该是使用它的一个简单的函数。但是如果我可以说,文档真的很差。

我试图了解如何正确使用它,但根据网页上提供的信息,这真的不容易。我希望也许其他一些人有经验并帮助我(可能还有很多其他人)理解它。

先分享一下界面:

tf.layers.batch_normalization(
    inputs,
    axis=-1,
    momentum=0.99,
    epsilon=0.001,
    center=True,
    scale=True,
    beta_initializer=tf.zeros_initializer(),
    gamma_initializer=tf.ones_initializer(),
    moving_mean_initializer=tf.zeros_initializer(),
    moving_variance_initializer=tf.ones_initializer(),
    beta_regularizer=None,
    gamma_regularizer=None,
    beta_constraint=None,
    gamma_constraint=None,
    training=False,
    trainable=True,
    name=None,
    reuse=None,
    renorm=False,
    renorm_clipping=None,
    renorm_momentum=0.99,
    fused=None,
    virtual_batch_size=None,
    adjustment=None
)
Run Code Online (Sandbox Code Playgroud)

Q1) beta 值初始化为零,gamma 值初始化为 1。但没有说明原因。当使用批量归一化时,我知道神经网络的普通偏差参数变得过时,而批量归一化步骤中的 beta 参数也做同样的事情。从这个角度来看,将 beta 设置为零是可以理解的。但是为什么 gamma 值初始化为 1?这真的是最有效的方法吗?

Q2)我在那里也看到了一个动量参数。文档只是说“移动平均线的动量。”。我假设在计算相应隐藏层中某个小批量的“平均值”值时使用此参数。换句话说,批量标准化中使用的平均值不是当前小批量的平均值,它主要是最近 100 个小批量的平均值(因为动量 = 0.99)。但目前还不清楚这个参数如何影响测试的执行,或者我是否只是通过计算成本和准确性来验证我的模型在开发集上。我的假设是当我处理测试集和开发集时,我将参数“training”设置为 False 以便动量参数对于该特定执行变得过时,而现在使用在训练期间计算的“均值”和“方差”值代替计算新的均值和方差值。如果我弄错了,应该是这样,但如果是这种情况,我在文档中看不到任何内容。谁能确认我的理解是正确的?如果没有,我将非常感谢对此的进一步解释。

Q3)我很难为可训练参数赋予意义。我假设这里的意思是 beta 和 gamma 参数。为什么他们不能训练?

Q4)“重用”参数。它到底是什么?

Q5) 调整参数。另一个谜..

Q5)一种总结性问题..这是我需要确认和反馈的总体假设..这里的重要参数是: - 输入 - 轴 - 动量 - 中心 - 规模 - 训练 我假设只要训练=真时训练,我们很安全。只要在验证开发集或测试集时甚至在现实生活中使用模型时training=False,我们也是安全的。

任何反馈将不胜感激。

附录:

混乱仍在继续。帮助!

我正在尝试使用此功能,而不是手动实现批处理规范器。我有以下前向传播函数,它循环遍历 NN 的各个层。

def forward_propagation_with_relu(X, num_units_in_layers, parameters, 
                                  normalize_batch, training, mb_size=7):

    L = len(num_units_in_layers)

    A_temp = tf.transpose(X)

    for i in range (1, L):
        W = parameters.get("W"+str(i))
        b = parameters.get("b"+str(i))
        Z_temp = tf.add(tf.matmul(W, A_temp), b)

        if normalize_batch:
            if (i < (L-1)):  
                with tf.variable_scope("batch_norm_scope", reuse=tf.AUTO_REUSE):
                    Z_temp = tf.layers.batch_normalization(Z_temp, axis=-1, 
                                                           training=training)

        A_temp = tf.nn.relu(Z_temp)

    return Z_temp   #This is the linear output of last layer
Run Code Online (Sandbox Code Playgroud)

tf.layers.batch_normalization(..) 函数想要具有静态尺寸,但在我的情况下没有。

由于我每次在运行优化器之前都应用小批量而不是训练整个训练集,因此 X 的 1 维似乎是未知的。

如果我写:

print(X.shape)
Run Code Online (Sandbox Code Playgroud)

我得到:

(?, 5)
Run Code Online (Sandbox Code Playgroud)

在这种情况下,当我运行整个程序时,会出现以下错误。

我在其他一些帖子中看到有人说他们可以通过使用 tf.reshape 函数来解决问题。我试试看.. Forward 道具运行良好,但后来它在 Adam Optimizer 中崩溃了..

这是我运行上面的代码时得到的结果(不使用 tf.reshape):

我该如何解决这个问题???

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-191-990fb7d7f7f6> in <module>()
     24 parameters = nn_model(train_input_paths, dev_input_paths, test_input_paths, learning_rate, num_train_epochs,
     25                       normalize_batch, epoch_period_to_save_cost, minibatch_size, num_units_in_layers,
---> 26                       lambd, print_progress)
     27 
     28 print(parameters)

<ipython-input-190-59594e979129> in nn_model(train_input_paths, dev_input_paths, test_input_paths, learning_rate, num_train_epochs, normalize_batch, epoch_period_to_save_cost, minibatch_size, num_units_in_layers, lambd, print_progress)
     34         # Forward propagation: Build the forward propagation in the tensorflow graph
     35         ZL = forward_propagation_with_relu(X_mini_batch, num_units_in_layers, 
---> 36                                            parameters, normalize_batch, training)
     37 
     38     with tf.name_scope("calc_cost"):

<ipython-input-187-8012e2fb6236> in forward_propagation_with_relu(X, num_units_in_layers, parameters, normalize_batch, training, mb_size)
     15                 with tf.variable_scope("batch_norm_scope", reuse=tf.AUTO_REUSE):
     16                     Z_temp = tf.layers.batch_normalization(Z_temp, axis=-1, 
---> 17                                                            training=training)
     18 
     19         A_temp = tf.nn.relu(Z_temp)

~/.local/lib/python3.5/site-packages/tensorflow/python/layers/normalization.py in batch_normalization(inputs, axis, momentum, epsilon, center, scale, beta_initializer, gamma_initializer, moving_mean_initializer, moving_variance_initializer, beta_regularizer, gamma_regularizer, beta_constraint, gamma_constraint, training, trainable, name, reuse, renorm, renorm_clipping, renorm_momentum, fused, virtual_batch_size, adjustment)
    775       _reuse=reuse,
    776       _scope=name)
--> 777   return layer.apply(inputs, training=training)
    778 
    779 

~/.local/lib/python3.5/site-packages/tensorflow/python/layers/base.py in apply(self, inputs, *args, **kwargs)
    805       Output tensor(s).
    806     """
--> 807     return self.__call__(inputs, *args, **kwargs)
    808 
    809   def _add_inbound_node(self,

~/.local/lib/python3.5/site-packages/tensorflow/python/layers/base.py in __call__(self, inputs, *args, **kwargs)
    676           self._defer_regularizers = True
    677           with ops.init_scope():
--> 678             self.build(input_shapes)
    679           # Create any regularizers added by `build`.
    680           self._maybe_create_variable_regularizers()

~/.local/lib/python3.5/site-packages/tensorflow/python/layers/normalization.py in build(self, input_shape)
    251       if axis_to_dim[x] is None:
    252         raise ValueError('Input has undefined `axis` dimension. Input shape: ',
--> 253                          input_shape)
    254     self.input_spec = base.InputSpec(ndim=ndims, axes=axis_to_dim)
    255 

ValueError: ('Input has undefined `axis` dimension. Input shape: ', TensorShape([Dimension(6), Dimension(None)]))
Run Code Online (Sandbox Code Playgroud)

这太没希望了。。

附录(2)

我正在添加更多信息:

下面简单的表示输入层有5个单元,每个隐藏层有6个单元,输出层有2个单元。

num_units_in_layers = [5,6,6,2] 
Run Code Online (Sandbox Code Playgroud)

这是带有 tf.reshape 的 forward prop 函数的更新版本

def forward_propagation_with_relu(X, num_units_in_layers, parameters, 
                                  normalize_batch, training, mb_size=7):

    L = len(num_units_in_layers)
    print("X.shape before reshape: ", X.shape)             # ADDED LINE 1
    X = tf.reshape(X, [mb_size, num_units_in_layers[0]])   # ADDED LINE 2
    print("X.shape after reshape: ", X.shape)              # ADDED LINE 3
    A_temp = tf.transpose(X)

    for i in range (1, L):
        W = parameters.get("W"+str(i))
        b = parameters.get("b"+str(i))
        Z_temp = tf.add(tf.matmul(W, A_temp), b)

        if normalize_batch:
            if (i < (L-1)):  
                with tf.variable_scope("batch_norm_scope", reuse=tf.AUTO_REUSE):
                    Z_temp = tf.layers.batch_normalization(Z_temp, axis=-1, 
                                                           training=training)

        A_temp = tf.nn.relu(Z_temp)

    return Z_temp   #This is the linear output of last layer
Run Code Online (Sandbox Code Playgroud)

当我这样做时,我可以运行 forward prop 函数。但它似乎在以后的执行中崩溃了。这是我得到的错误。(请注意,我在前向 prop 函数中打印了重塑前后输入 X 的形状)。

X.shape before reshape:  (?, 5)
X.shape after reshape:  (7, 5)

---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
~/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1349     try:
-> 1350       return fn(*args)
   1351     except errors.OpError as e:

~/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py in _run_fn(session, feed_dict, fetch_list, target_list, options, run_metadata)
   1328                                    feed_dict, fetch_list, target_list,
-> 1329                                    status, run_metadata)
   1330 

~/.local/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py in __exit__(self, type_arg, value_arg, traceback_arg)
    515             compat.as_text(c_api.TF_Message(self.status.status)),
--> 516             c_api.TF_GetCode(self.status.status))
    517     # Delete the underlying status object from memory otherwise it stays alive

InvalidArgumentError: Incompatible shapes: [7] vs. [2]
     [[Node: forward_prop/batch_norm_scope/batch_normalization/cond_2/AssignMovingAvg/sub = Sub[T=DT_FLOAT, _class=["loc:@batch_norm_scope/batch_normalization/moving_mean"], _device="/job:localhost/replica:0/task:0/device:CPU:0"](forward_prop/batch_norm_scope/batch_normalization/cond_2/Switch_1:1, forward_prop/batch_norm_scope/batch_normalization/cond_2/AssignMovingAvg/sub/Switch_1:1)]]

During handling of the above exception, another exception occurred:

InvalidArgumentError                      Traceback (most recent call last)
<ipython-input-222-990fb7d7f7f6> in <module>()
     24 parameters = nn_model(train_input_paths, dev_input_paths, test_input_paths, learning_rate, num_train_epochs,
     25                       normalize_batch, epoch_period_to_save_cost, minibatch_size, num_units_in_layers,
---> 26                       lambd, print_progress)
     27 
     28 print(parameters)

<ipython-input-221-59594e979129> in nn_model(train_input_paths, dev_input_paths, test_input_paths, learning_rate, num_train_epochs, normalize_batch, epoch_period_to_save_cost, minibatch_size, num_units_in_layers, lambd, print_progress)
     88                                                                         cost_mini_batch,
     89                                                                         accuracy_mini_batch],
---> 90                                                                         feed_dict={training: True})
     91                       nr_of_minibatches += 1
     92                       sum_minibatch_costs += minibatch_cost

~/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
    893     try:
    894       result = self._run(None, fetches, feed_dict, options_ptr,
--> 895                          run_metadata_ptr)
    896       if run_metadata:
    897         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

~/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
   1126     if final_fetches or final_targets or (handle and feed_dict_tensor):
   1127       results = self._do_run(handle, final_targets, final_fetches,
-> 1128                              feed_dict_tensor, options, run_metadata)
   1129     else:
   1130       results = []

~/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
   1342     if handle is None:
   1343       return self._do_call(_run_fn, self._session, feeds, fetches, targets,
-> 1344                            options, run_metadata)
   1345     else:
   1346       return self._do_call(_prun_fn, self._session, handle, feeds, fetches)

~/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1361         except KeyError:
   1362           pass
-> 1363       raise type(e)(node_def, op, message)
   1364 
   1365   def _extend_graph(self):

InvalidArgumentError: Incompatible shapes: [7] vs. [2]
     [[Node: forward_prop/batch_norm_scope/batch_normalization/cond_2/AssignMovingAvg/sub = Sub[T=DT_FLOAT, _class=["loc:@batch_norm_scope/batch_normalization/moving_mean"], _device="/job:localhost/replica:0/task:0/device:CPU:0"](forward_prop/batch_norm_scope/batch_normalization/cond_2/Switch_1:1, forward_prop/batch_norm_scope/batch_normalization/cond_2/AssignMovingAvg/sub/Switch_1:1)]]

Caused by op 'forward_prop/batch_norm_scope/batch_normalization/cond_2/AssignMovingAvg/sub', defined at:
  File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/ipykernel_launcher.py", line 16, in <module>
    app.launch_new_instance()
  File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/traitlets/config/application.py", line 658, in launch_instance
    app.start()
  File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/ipykernel/kernelapp.py", line 478, in start
    self.io_loop.start()
  File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/zmq/eventloop/ioloop.py", line 177, in start
    super(ZMQIOLoop, self).start()
  File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/tornado/ioloop.py", line 888, in start
    handler_func(fd_obj, events)
  File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/tornado/stack_context.py", line 277, in null_wrapper
    return fn(*args, **kwargs)
  File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py", line 440, in _handle_events
    self._handle_recv()
  File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py", line 472, in _handle_recv
    self._run_callback(callback, msg)
  File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py", line 414, in _run_callback
    callback(*args, **kwargs)
  File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/tornado/stack_context.py", line 277, in null_wrapper
    return fn(*args, **kwargs)
  File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/ipykernel/kernelbase.py", line 283, in dispatcher
    return self.dispatch_shell(stream, msg)
  File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/ipykernel/kernelbase.py", line 233, in dispatch_shell
    handler(stream, idents, msg)
  File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/ipykernel/kernelbase.py", line 399, in execute_request
    user_expressions, allow_stdin)
  File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/ipykernel/ipkernel.py", line 208, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/ipykernel/zmqshell.py", line 537, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2728, in run_cell
    interactivity=interactivity, compiler=compiler, result=result)
  File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2850, in run_ast_nodes
    if self.run_code(code, result):
  File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2910, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-222-990fb7d7f7f6>", line 26, in <module>
    lambd, print_progress)
  File "<ipython-input-221-59594e979129>", line 36, in nn_model
    parameters, normalize_batch, training)
  File "<ipython-input-218-62e4c6126c2c>", line 19, in forward_propagation_with_relu
    training=training)
  File "/home/cesncn/.local/lib/python3.5/site-packages/tensorflow/python/layers/normalization.py", line 777, in batch_normalization
    return layer.apply(inputs, training=training)
  File "/home/cesncn/.local/lib/python3.5/site-packages/tensorflow/python/layers/base.py", line 807, in apply
    return self.__call__(inputs, *args, **kwargs)
  File "/home/cesncn/.local/lib/python3.5/site-packages/tensorflow/python/layers/base.py", line 697, in __call__
    outputs = self.call(inputs, *args, **kwargs)
  File "/home/cesncn/.local/lib/python3.5/site-packages/tensorflow/python/layers/normalization.py", line 602, in call
    lambda: self.moving_mean)
  File "/home/cesncn/.local/lib/python3.5/site-packages/tensorflow/python/layers/utils.py", line 211, in smart_cond
    return control_flow_ops.cond(pred, true_fn=fn1, false_fn=fn2, name=name)
  File "/home/cesncn/.local/lib/python3.5/site-packages/tensorflow/python/util/deprecation.py", line 316, in new_func
    return func(*args, **kwargs)
  File "/home/cesncn/.local/lib/python3.5/site-packages/tensorflow/python/ops/control_flow_ops.py", line 1985, in cond
    orig_res_t, res_t = context_t.BuildCondBranch(true_fn)
  File "/home/cesncn/.local/lib/python3.5/site-packages/tensorflow/python/ops/control_flow_ops.py", line 1839, in BuildCondBranch
    original_result = fn()
  File "/home/cesncn/.local/lib/python3.5/site-packages/tensorflow/python/layers/normalization.py", line 601, in <lambda>
    lambda: _do_update(self.moving_mean, new_mean),
  File "/home/cesncn/.local/lib/python3.5/site-packages/tensorflow/python/layers/normalization.py", line 597, in _do_update
    var, value, self.momentum, zero_debias=False)
  File "/home/cesncn/.local/lib/python3.5/site-packages/tensorflow/python/training/moving_averages.py", line 87, in assign_moving_average
    update_delta = (variable - value) * decay
  File "/home/cesncn/.local/lib/python3.5/site-packages/tensorflow/python/ops/variables.py", line 778, in _run_op
    return getattr(ops.Tensor, operator)(a._AsTensor(), *args)
  File "/home/cesncn/.local/lib/python3.5/site-packages/tensorflow/python/ops/math_ops.py", line 934, in binary_op_wrapper
    return func(x, y, name=name)
  File "/home/cesncn/.local/lib/python3.5/site-packages/tensorflow/python/ops/gen_math_ops.py", line 4819, in _sub
    "Sub", x=x, y=y, name=name)
  File "/home/cesncn/.local/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/cesncn/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3267, in create_op
    op_def=op_def)
  File "/home/cesncn/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1650, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): Incompatible shapes: [7] vs. [2]
     [[Node: forward_prop/batch_norm_scope/batch_normalization/cond_2/AssignMovingAvg/sub = Sub[T=DT_FLOAT, _class=["loc:@batch_norm_scope/batch_normalization/moving_mean"], _device="/job:localhost/replica:0/task:0/device:CPU:0"](forward_prop/batch_norm_scope/batch_normalization/cond_2/Switch_1:1, forward_prop/batch_norm_scope/batch_normalization/cond_2/AssignMovingAvg/sub/Switch_1:1)]]
Run Code Online (Sandbox Code Playgroud)

关于为什么 X 的形状不是静态的问题......我不知道......这是我设置数据集的方式。

with tf.name_scope("next_train_batch"):
    filenames = tf.placeholder(tf.string, shape=[None])
    dataset = tf.data.Dataset.from_tensor_slices(filenames)
    dataset = dataset.flat_map(lambda filename: tf.data.TextLineDataset(filename).skip(1).map(decode_csv))
    dataset = dataset.shuffle(buffer_size=1000)
    dataset = dataset.batch(minibatch_size)
    iterator = dataset.make_initializable_iterator()
    X_mini_batch, Y_mini_batch = iterator.get_next()
Run Code Online (Sandbox Code Playgroud)

我有 2 个包含火车数据的 csv 文件。

train_path1 = "train1.csv"
train_path2 = "train2.csv"
train_input_paths = [train_path1, train_path2]
Run Code Online (Sandbox Code Playgroud)

我使用可初始化的迭代器

Ric*_*wth 5

Q1) 将 gamma 初始化为 1,将 beta 初始化为 0 意味着直接使用归一化输入。由于没有关于层输出的方差应该是什么的先验信息,假设标准高斯是足够公平的。

Q2)在训练阶段(training=True),该批次被归自己的均值和VAR,假设训练数据随机抽样。在测试( training=False)过程中,由于测试数据可以任意采样,我们不能使用它们的均值和变量。因此,正如您所说,我们使用来自最后“100”次训练迭代的移动平均估计。

Q3) 是的,可训练是指betagamma。有一些情况需要设置trainable=False,例如,如果使用一种新颖的方法来更新参数,或者如果 batch_norm 层是预训练的并且需要冻结。

Q4) 您可能也注意到reuse了其他tf.layers函数中的参数。一般来说,如果您想多次调用一个层(例如训练和验证),并且您不希望 TensorFlow 认为您正在创建一个新层,您可以设置reuse=True. 我更喜欢with tf.variable_scope(..., reuse=tf.AUTO_REUSE):达到同样的目的。

Q5)我不确定这个。我想这是为想要设计新技巧来调整比例和偏差的用户准备的。

Q6)是的,你是对的。