如何在TensorFlow上进行Xavier初始化

Ale*_*dro 80 python tensorflow

我正在将我的Caffe网络移植到TensorFlow,但它似乎没有xavier初始化.我正在使用,truncated_normal但这似乎使得训练更加困难.

Sun*_*Kim 114

从版本0.8开始,有一个Xavier初始化程序,请参阅此处的文档.

你可以使用这样的东西:

W = tf.get_variable("W", shape=[784, 256],
           initializer=tf.contrib.layers.xavier_initializer())
Run Code Online (Sandbox Code Playgroud)

  • 你知道这样做而不给'get_variable`赋予形状,而是将其赋予初始化器吗?我以前有`tf.truncated_normal(shape = [dims [l-1],dims [l]],mean = mu [l],stddev = std [l],dtype = tf.float64)`我指定了那里的形状,但现在你的建议有点拧我的代码.你有什么建议吗? (3认同)
  • 没有版本的"当前"链接:https://www.tensorflow.org/api_docs/python/tf/contrib/layers/xavier_initializer (2认同)

Sau*_*tro 28

只是添加另一个关于如何tf.Variable使用Xavier和Yoshua方法定义初始化的示例:

graph = tf.Graph()
with graph.as_default():
    ...
    initializer = tf.contrib.layers.xavier_initializer()
    w1 = tf.Variable(initializer(w1_shape))
    b1 = tf.Variable(initializer(b1_shape))
    ...
Run Code Online (Sandbox Code Playgroud)

这使我nan在使用具有RELU的多个层时由于数值不稳定而使我的损失函数具有值.

  • 这种格式最适合我的代码 - 它允许我将我的学习率返回到0.5(我不得不在添加另一个relu'd图层时将其降低到0.06).一旦我将这个初始化器应用于所有隐藏层,我就会从最初的几百个时期获得令人难以置信的高验证率.我无法相信它的不同之处! (2认同)

Del*_*lip 13

@ Aleph7,Xavier/Glorot初始化取决于神经元的传入连接数(fan_in),传出连接数(fan_out)和激活函数(sigmoid或tanh)的种类.见:http://jmlr.org/proceedings/papers/v9/glorot10a/glorot10a.pdf

那么现在,问你的问题.这就是我在TensorFlow中的表现:

(fan_in, fan_out) = ...
    low = -4*np.sqrt(6.0/(fan_in + fan_out)) # use 4 for sigmoid, 1 for tanh activation 
    high = 4*np.sqrt(6.0/(fan_in + fan_out))
    return tf.Variable(tf.random_uniform(shape, minval=low, maxval=high, dtype=tf.float32))
Run Code Online (Sandbox Code Playgroud)

请注意,我们应该从统一分布中抽样,而不是正常分布,如另一个答案中所建议的那样.

顺便说一句,我昨天写了一篇关于使用TensorFlow的不同内容的帖子,它恰好也使用了Xavier初始化.如果您有兴趣,还有一个带有端到端示例的python笔记本:https://github.com/delip/blog-stuff/blob/master/tensorflow_ufp.ipynb


Hoo*_*ked 8

一个很好的包装tensorflow调用prettytensor在源代码中给出了一个实现(直接从这里复制):

def xavier_init(n_inputs, n_outputs, uniform=True):
  """Set the parameter initialization using the method described.
  This method is designed to keep the scale of the gradients roughly the same
  in all layers.
  Xavier Glorot and Yoshua Bengio (2010):
           Understanding the difficulty of training deep feedforward neural
           networks. International conference on artificial intelligence and
           statistics.
  Args:
    n_inputs: The number of input nodes into each output.
    n_outputs: The number of output nodes for each input.
    uniform: If true use a uniform distribution, otherwise use a normal.
  Returns:
    An initializer.
  """
  if uniform:
    # 6 was used in the paper.
    init_range = math.sqrt(6.0 / (n_inputs + n_outputs))
    return tf.random_uniform_initializer(-init_range, init_range)
  else:
    # 3 gives us approximately the same limits as above since this repicks
    # values greater than 2 standard deviations from the mean.
    stddev = math.sqrt(3.0 / (n_inputs + n_outputs))
    return tf.truncated_normal_initializer(stddev=stddev)
Run Code Online (Sandbox Code Playgroud)


Sal*_*ali 8

TF-contrib有xavier_initializer.以下是如何使用它的示例:

import tensorflow as tf
a = tf.get_variable("a", shape=[4, 4], initializer=tf.contrib.layers.xavier_initializer())
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    print sess.run(a)
Run Code Online (Sandbox Code Playgroud)

除此之外,tensorflow还有其他初始化器:


y.s*_*hyk 6

In Tensorflow 2.0 and further both tf.contrib.* and tf.get_variable() are deprecated. In order to do Xavier initialization you now have to switch to:

init = tf.initializers.GlorotUniform()
var = tf.Variable(init(shape=shape))
# or a oneliner with a little confusing brackets
var = tf.Variable(tf.initializers.GlorotUniform()(shape=shape))
Run Code Online (Sandbox Code Playgroud)

Glorot uniform and Xavier uniform are two different names of the same initialization type. If you want to know more about how to use initializations in TF2.0 with or without Keras refer to documentation.