edn*_*edn 2 python machine-learning neural-network tensorflow
我不确定是否只有我认为 tensorflow 文档有点弱。
我打算使用 tf.nn.batch_normalization 函数来实现批量标准化,但后来发现 tf.layers.batch_normalization 函数似乎应该是使用它的一个简单的函数。但是如果我可以说,文档真的很差。
我试图了解如何正确使用它,但根据网页上提供的信息,这真的不容易。我希望也许其他一些人有经验并帮助我(可能还有很多其他人)理解它。
先分享一下界面:
tf.layers.batch_normalization(
inputs,
axis=-1,
momentum=0.99,
epsilon=0.001,
center=True,
scale=True,
beta_initializer=tf.zeros_initializer(),
gamma_initializer=tf.ones_initializer(),
moving_mean_initializer=tf.zeros_initializer(),
moving_variance_initializer=tf.ones_initializer(),
beta_regularizer=None,
gamma_regularizer=None,
beta_constraint=None,
gamma_constraint=None,
training=False,
trainable=True,
name=None,
reuse=None,
renorm=False,
renorm_clipping=None,
renorm_momentum=0.99,
fused=None,
virtual_batch_size=None,
adjustment=None
)
Run Code Online (Sandbox Code Playgroud)
Q1) beta 值初始化为零,gamma 值初始化为 1。但没有说明原因。当使用批量归一化时,我知道神经网络的普通偏差参数变得过时,而批量归一化步骤中的 beta 参数也做同样的事情。从这个角度来看,将 beta 设置为零是可以理解的。但是为什么 gamma 值初始化为 1?这真的是最有效的方法吗?
Q2)我在那里也看到了一个动量参数。文档只是说“移动平均线的动量。”。我假设在计算相应隐藏层中某个小批量的“平均值”值时使用此参数。换句话说,批量标准化中使用的平均值不是当前小批量的平均值,它主要是最近 100 个小批量的平均值(因为动量 = 0.99)。但目前还不清楚这个参数如何影响测试的执行,或者我是否只是通过计算成本和准确性来验证我的模型在开发集上。我的假设是当我处理测试集和开发集时,我将参数“training”设置为 False 以便动量参数对于该特定执行变得过时,而现在使用在训练期间计算的“均值”和“方差”值代替计算新的均值和方差值。如果我弄错了,应该是这样,但如果是这种情况,我在文档中看不到任何内容。谁能确认我的理解是正确的?如果没有,我将非常感谢对此的进一步解释。
Q3)我很难为可训练参数赋予意义。我假设这里的意思是 beta 和 gamma 参数。为什么他们不能训练?
Q4)“重用”参数。它到底是什么?
Q5) 调整参数。另一个谜..
Q5)一种总结性问题..这是我需要确认和反馈的总体假设..这里的重要参数是: - 输入 - 轴 - 动量 - 中心 - 规模 - 训练 我假设只要训练=真时训练,我们很安全。只要在验证开发集或测试集时甚至在现实生活中使用模型时training=False,我们也是安全的。
任何反馈将不胜感激。
附录:
混乱仍在继续。帮助!
我正在尝试使用此功能,而不是手动实现批处理规范器。我有以下前向传播函数,它循环遍历 NN 的各个层。
def forward_propagation_with_relu(X, num_units_in_layers, parameters,
normalize_batch, training, mb_size=7):
L = len(num_units_in_layers)
A_temp = tf.transpose(X)
for i in range (1, L):
W = parameters.get("W"+str(i))
b = parameters.get("b"+str(i))
Z_temp = tf.add(tf.matmul(W, A_temp), b)
if normalize_batch:
if (i < (L-1)):
with tf.variable_scope("batch_norm_scope", reuse=tf.AUTO_REUSE):
Z_temp = tf.layers.batch_normalization(Z_temp, axis=-1,
training=training)
A_temp = tf.nn.relu(Z_temp)
return Z_temp #This is the linear output of last layer
Run Code Online (Sandbox Code Playgroud)
tf.layers.batch_normalization(..) 函数想要具有静态尺寸,但在我的情况下没有。
由于我每次在运行优化器之前都应用小批量而不是训练整个训练集,因此 X 的 1 维似乎是未知的。
如果我写:
print(X.shape)
Run Code Online (Sandbox Code Playgroud)
我得到:
(?, 5)
Run Code Online (Sandbox Code Playgroud)
在这种情况下,当我运行整个程序时,会出现以下错误。
我在其他一些帖子中看到有人说他们可以通过使用 tf.reshape 函数来解决问题。我试试看.. Forward 道具运行良好,但后来它在 Adam Optimizer 中崩溃了..
这是我运行上面的代码时得到的结果(不使用 tf.reshape):
我该如何解决这个问题???
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-191-990fb7d7f7f6> in <module>()
24 parameters = nn_model(train_input_paths, dev_input_paths, test_input_paths, learning_rate, num_train_epochs,
25 normalize_batch, epoch_period_to_save_cost, minibatch_size, num_units_in_layers,
---> 26 lambd, print_progress)
27
28 print(parameters)
<ipython-input-190-59594e979129> in nn_model(train_input_paths, dev_input_paths, test_input_paths, learning_rate, num_train_epochs, normalize_batch, epoch_period_to_save_cost, minibatch_size, num_units_in_layers, lambd, print_progress)
34 # Forward propagation: Build the forward propagation in the tensorflow graph
35 ZL = forward_propagation_with_relu(X_mini_batch, num_units_in_layers,
---> 36 parameters, normalize_batch, training)
37
38 with tf.name_scope("calc_cost"):
<ipython-input-187-8012e2fb6236> in forward_propagation_with_relu(X, num_units_in_layers, parameters, normalize_batch, training, mb_size)
15 with tf.variable_scope("batch_norm_scope", reuse=tf.AUTO_REUSE):
16 Z_temp = tf.layers.batch_normalization(Z_temp, axis=-1,
---> 17 training=training)
18
19 A_temp = tf.nn.relu(Z_temp)
~/.local/lib/python3.5/site-packages/tensorflow/python/layers/normalization.py in batch_normalization(inputs, axis, momentum, epsilon, center, scale, beta_initializer, gamma_initializer, moving_mean_initializer, moving_variance_initializer, beta_regularizer, gamma_regularizer, beta_constraint, gamma_constraint, training, trainable, name, reuse, renorm, renorm_clipping, renorm_momentum, fused, virtual_batch_size, adjustment)
775 _reuse=reuse,
776 _scope=name)
--> 777 return layer.apply(inputs, training=training)
778
779
~/.local/lib/python3.5/site-packages/tensorflow/python/layers/base.py in apply(self, inputs, *args, **kwargs)
805 Output tensor(s).
806 """
--> 807 return self.__call__(inputs, *args, **kwargs)
808
809 def _add_inbound_node(self,
~/.local/lib/python3.5/site-packages/tensorflow/python/layers/base.py in __call__(self, inputs, *args, **kwargs)
676 self._defer_regularizers = True
677 with ops.init_scope():
--> 678 self.build(input_shapes)
679 # Create any regularizers added by `build`.
680 self._maybe_create_variable_regularizers()
~/.local/lib/python3.5/site-packages/tensorflow/python/layers/normalization.py in build(self, input_shape)
251 if axis_to_dim[x] is None:
252 raise ValueError('Input has undefined `axis` dimension. Input shape: ',
--> 253 input_shape)
254 self.input_spec = base.InputSpec(ndim=ndims, axes=axis_to_dim)
255
ValueError: ('Input has undefined `axis` dimension. Input shape: ', TensorShape([Dimension(6), Dimension(None)]))
Run Code Online (Sandbox Code Playgroud)
这太没希望了。。
附录(2)
我正在添加更多信息:
下面简单的表示输入层有5个单元,每个隐藏层有6个单元,输出层有2个单元。
num_units_in_layers = [5,6,6,2]
Run Code Online (Sandbox Code Playgroud)
这是带有 tf.reshape 的 forward prop 函数的更新版本
def forward_propagation_with_relu(X, num_units_in_layers, parameters,
normalize_batch, training, mb_size=7):
L = len(num_units_in_layers)
print("X.shape before reshape: ", X.shape) # ADDED LINE 1
X = tf.reshape(X, [mb_size, num_units_in_layers[0]]) # ADDED LINE 2
print("X.shape after reshape: ", X.shape) # ADDED LINE 3
A_temp = tf.transpose(X)
for i in range (1, L):
W = parameters.get("W"+str(i))
b = parameters.get("b"+str(i))
Z_temp = tf.add(tf.matmul(W, A_temp), b)
if normalize_batch:
if (i < (L-1)):
with tf.variable_scope("batch_norm_scope", reuse=tf.AUTO_REUSE):
Z_temp = tf.layers.batch_normalization(Z_temp, axis=-1,
training=training)
A_temp = tf.nn.relu(Z_temp)
return Z_temp #This is the linear output of last layer
Run Code Online (Sandbox Code Playgroud)
当我这样做时,我可以运行 forward prop 函数。但它似乎在以后的执行中崩溃了。这是我得到的错误。(请注意,我在前向 prop 函数中打印了重塑前后输入 X 的形状)。
X.shape before reshape: (?, 5)
X.shape after reshape: (7, 5)
---------------------------------------------------------------------------
InvalidArgumentError Traceback (most recent call last)
~/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
1349 try:
-> 1350 return fn(*args)
1351 except errors.OpError as e:
~/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py in _run_fn(session, feed_dict, fetch_list, target_list, options, run_metadata)
1328 feed_dict, fetch_list, target_list,
-> 1329 status, run_metadata)
1330
~/.local/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py in __exit__(self, type_arg, value_arg, traceback_arg)
515 compat.as_text(c_api.TF_Message(self.status.status)),
--> 516 c_api.TF_GetCode(self.status.status))
517 # Delete the underlying status object from memory otherwise it stays alive
InvalidArgumentError: Incompatible shapes: [7] vs. [2]
[[Node: forward_prop/batch_norm_scope/batch_normalization/cond_2/AssignMovingAvg/sub = Sub[T=DT_FLOAT, _class=["loc:@batch_norm_scope/batch_normalization/moving_mean"], _device="/job:localhost/replica:0/task:0/device:CPU:0"](forward_prop/batch_norm_scope/batch_normalization/cond_2/Switch_1:1, forward_prop/batch_norm_scope/batch_normalization/cond_2/AssignMovingAvg/sub/Switch_1:1)]]
During handling of the above exception, another exception occurred:
InvalidArgumentError Traceback (most recent call last)
<ipython-input-222-990fb7d7f7f6> in <module>()
24 parameters = nn_model(train_input_paths, dev_input_paths, test_input_paths, learning_rate, num_train_epochs,
25 normalize_batch, epoch_period_to_save_cost, minibatch_size, num_units_in_layers,
---> 26 lambd, print_progress)
27
28 print(parameters)
<ipython-input-221-59594e979129> in nn_model(train_input_paths, dev_input_paths, test_input_paths, learning_rate, num_train_epochs, normalize_batch, epoch_period_to_save_cost, minibatch_size, num_units_in_layers, lambd, print_progress)
88 cost_mini_batch,
89 accuracy_mini_batch],
---> 90 feed_dict={training: True})
91 nr_of_minibatches += 1
92 sum_minibatch_costs += minibatch_cost
~/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
893 try:
894 result = self._run(None, fetches, feed_dict, options_ptr,
--> 895 run_metadata_ptr)
896 if run_metadata:
897 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)
~/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
1126 if final_fetches or final_targets or (handle and feed_dict_tensor):
1127 results = self._do_run(handle, final_targets, final_fetches,
-> 1128 feed_dict_tensor, options, run_metadata)
1129 else:
1130 results = []
~/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
1342 if handle is None:
1343 return self._do_call(_run_fn, self._session, feeds, fetches, targets,
-> 1344 options, run_metadata)
1345 else:
1346 return self._do_call(_prun_fn, self._session, handle, feeds, fetches)
~/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
1361 except KeyError:
1362 pass
-> 1363 raise type(e)(node_def, op, message)
1364
1365 def _extend_graph(self):
InvalidArgumentError: Incompatible shapes: [7] vs. [2]
[[Node: forward_prop/batch_norm_scope/batch_normalization/cond_2/AssignMovingAvg/sub = Sub[T=DT_FLOAT, _class=["loc:@batch_norm_scope/batch_normalization/moving_mean"], _device="/job:localhost/replica:0/task:0/device:CPU:0"](forward_prop/batch_norm_scope/batch_normalization/cond_2/Switch_1:1, forward_prop/batch_norm_scope/batch_normalization/cond_2/AssignMovingAvg/sub/Switch_1:1)]]
Caused by op 'forward_prop/batch_norm_scope/batch_normalization/cond_2/AssignMovingAvg/sub', defined at:
File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/ipykernel_launcher.py", line 16, in <module>
app.launch_new_instance()
File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/traitlets/config/application.py", line 658, in launch_instance
app.start()
File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/ipykernel/kernelapp.py", line 478, in start
self.io_loop.start()
File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/zmq/eventloop/ioloop.py", line 177, in start
super(ZMQIOLoop, self).start()
File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/tornado/ioloop.py", line 888, in start
handler_func(fd_obj, events)
File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/tornado/stack_context.py", line 277, in null_wrapper
return fn(*args, **kwargs)
File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py", line 440, in _handle_events
self._handle_recv()
File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py", line 472, in _handle_recv
self._run_callback(callback, msg)
File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py", line 414, in _run_callback
callback(*args, **kwargs)
File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/tornado/stack_context.py", line 277, in null_wrapper
return fn(*args, **kwargs)
File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/ipykernel/kernelbase.py", line 283, in dispatcher
return self.dispatch_shell(stream, msg)
File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/ipykernel/kernelbase.py", line 233, in dispatch_shell
handler(stream, idents, msg)
File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/ipykernel/kernelbase.py", line 399, in execute_request
user_expressions, allow_stdin)
File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/ipykernel/ipkernel.py", line 208, in do_execute
res = shell.run_cell(code, store_history=store_history, silent=silent)
File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/ipykernel/zmqshell.py", line 537, in run_cell
return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2728, in run_cell
interactivity=interactivity, compiler=compiler, result=result)
File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2850, in run_ast_nodes
if self.run_code(code, result):
File "/home/cesncn/anaconda3/envs/tensorflow/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2910, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-222-990fb7d7f7f6>", line 26, in <module>
lambd, print_progress)
File "<ipython-input-221-59594e979129>", line 36, in nn_model
parameters, normalize_batch, training)
File "<ipython-input-218-62e4c6126c2c>", line 19, in forward_propagation_with_relu
training=training)
File "/home/cesncn/.local/lib/python3.5/site-packages/tensorflow/python/layers/normalization.py", line 777, in batch_normalization
return layer.apply(inputs, training=training)
File "/home/cesncn/.local/lib/python3.5/site-packages/tensorflow/python/layers/base.py", line 807, in apply
return self.__call__(inputs, *args, **kwargs)
File "/home/cesncn/.local/lib/python3.5/site-packages/tensorflow/python/layers/base.py", line 697, in __call__
outputs = self.call(inputs, *args, **kwargs)
File "/home/cesncn/.local/lib/python3.5/site-packages/tensorflow/python/layers/normalization.py", line 602, in call
lambda: self.moving_mean)
File "/home/cesncn/.local/lib/python3.5/site-packages/tensorflow/python/layers/utils.py", line 211, in smart_cond
return control_flow_ops.cond(pred, true_fn=fn1, false_fn=fn2, name=name)
File "/home/cesncn/.local/lib/python3.5/site-packages/tensorflow/python/util/deprecation.py", line 316, in new_func
return func(*args, **kwargs)
File "/home/cesncn/.local/lib/python3.5/site-packages/tensorflow/python/ops/control_flow_ops.py", line 1985, in cond
orig_res_t, res_t = context_t.BuildCondBranch(true_fn)
File "/home/cesncn/.local/lib/python3.5/site-packages/tensorflow/python/ops/control_flow_ops.py", line 1839, in BuildCondBranch
original_result = fn()
File "/home/cesncn/.local/lib/python3.5/site-packages/tensorflow/python/layers/normalization.py", line 601, in <lambda>
lambda: _do_update(self.moving_mean, new_mean),
File "/home/cesncn/.local/lib/python3.5/site-packages/tensorflow/python/layers/normalization.py", line 597, in _do_update
var, value, self.momentum, zero_debias=False)
File "/home/cesncn/.local/lib/python3.5/site-packages/tensorflow/python/training/moving_averages.py", line 87, in assign_moving_average
update_delta = (variable - value) * decay
File "/home/cesncn/.local/lib/python3.5/site-packages/tensorflow/python/ops/variables.py", line 778, in _run_op
return getattr(ops.Tensor, operator)(a._AsTensor(), *args)
File "/home/cesncn/.local/lib/python3.5/site-packages/tensorflow/python/ops/math_ops.py", line 934, in binary_op_wrapper
return func(x, y, name=name)
File "/home/cesncn/.local/lib/python3.5/site-packages/tensorflow/python/ops/gen_math_ops.py", line 4819, in _sub
"Sub", x=x, y=y, name=name)
File "/home/cesncn/.local/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/cesncn/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3267, in create_op
op_def=op_def)
File "/home/cesncn/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1650, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
InvalidArgumentError (see above for traceback): Incompatible shapes: [7] vs. [2]
[[Node: forward_prop/batch_norm_scope/batch_normalization/cond_2/AssignMovingAvg/sub = Sub[T=DT_FLOAT, _class=["loc:@batch_norm_scope/batch_normalization/moving_mean"], _device="/job:localhost/replica:0/task:0/device:CPU:0"](forward_prop/batch_norm_scope/batch_normalization/cond_2/Switch_1:1, forward_prop/batch_norm_scope/batch_normalization/cond_2/AssignMovingAvg/sub/Switch_1:1)]]
Run Code Online (Sandbox Code Playgroud)
关于为什么 X 的形状不是静态的问题......我不知道......这是我设置数据集的方式。
with tf.name_scope("next_train_batch"):
filenames = tf.placeholder(tf.string, shape=[None])
dataset = tf.data.Dataset.from_tensor_slices(filenames)
dataset = dataset.flat_map(lambda filename: tf.data.TextLineDataset(filename).skip(1).map(decode_csv))
dataset = dataset.shuffle(buffer_size=1000)
dataset = dataset.batch(minibatch_size)
iterator = dataset.make_initializable_iterator()
X_mini_batch, Y_mini_batch = iterator.get_next()
Run Code Online (Sandbox Code Playgroud)
我有 2 个包含火车数据的 csv 文件。
train_path1 = "train1.csv"
train_path2 = "train2.csv"
train_input_paths = [train_path1, train_path2]
Run Code Online (Sandbox Code Playgroud)
我使用可初始化的迭代器
Q1) 将 gamma 初始化为 1,将 beta 初始化为 0 意味着直接使用归一化输入。由于没有关于层输出的方差应该是什么的先验信息,假设标准高斯是足够公平的。
Q2)在训练阶段(training=True
),该批次被归自己的均值和VAR,假设训练数据随机抽样。在测试( training=False
)过程中,由于测试数据可以任意采样,我们不能使用它们的均值和变量。因此,正如您所说,我们使用来自最后“100”次训练迭代的移动平均估计。
Q3) 是的,可训练是指beta
和gamma
。有一些情况需要设置trainable=False
,例如,如果使用一种新颖的方法来更新参数,或者如果 batch_norm 层是预训练的并且需要冻结。
Q4) 您可能也注意到reuse
了其他tf.layers
函数中的参数。一般来说,如果您想多次调用一个层(例如训练和验证),并且您不希望 TensorFlow 认为您正在创建一个新层,您可以设置reuse=True
. 我更喜欢with tf.variable_scope(..., reuse=tf.AUTO_REUSE):
达到同样的目的。
Q5)我不确定这个。我想这是为想要设计新技巧来调整比例和偏差的用户准备的。
Q6)是的,你是对的。
归档时间: |
|
查看次数: |
3479 次 |
最近记录: |