Tensorflow RNN权重矩阵初始化

Question

Tensorflow RNN权重矩阵初始化

yok*_*oki 7 tensorflow recurrent-neural-network

我正在使用bidirectional_rnn,GRUCell但这是关于Tensorflow中RNN的一般性问题.

我找不到如何初始化权重矩阵(输入到隐藏,隐藏到隐藏).它们是随机初始化的吗？归零？他们为我创建的每个LSTM进行了不同的初始化吗？

编辑:这个问题的另一个动机是预训一些LSTM并在后续模型中使用它们的权重.我目前不知道如何在不保存所有状态和恢复整个模型的情况下做到这一点.

谢谢.

Answer 1

Zho*_*ang 10

如何初始化RNN的权重矩阵？

我相信人们正在使用RNN权重矩阵的随机正态初始化.查看TensorFlow GitHub Repo中的示例.由于笔记本有点长,他们有一个简单的LSTM模型,tf.truncated_normal用于初始化权重和tf.zeros初始化偏差(尽管我之前尝试过使用tf.ones初始化偏差,似乎也有效).我相信标准偏差是一个你可以调整自己的超参数.有时,权重初始化对梯度流很重要.虽然据我所知,LSTM本身被设计成处理梯度消失问题(和梯度剪裁是帮助梯度爆炸的问题),所以也许你并不需要成为超级小心的设置std_dev在LSTM？我在卷积神经网络上下文中阅读了推荐Xavier初始化(Xavier初始化器的TF API文档)的论文.我不知道人们是否在RNN中使用它,但我想你甚至可以尝试RNN中的那些,如果你想看看它是否有帮助.

现在跟进@ Allen的回答以及你在评论中留下的后续问题.

如何用变量范围控制初始化？

使用我链接到的TensorFlow GitHub python笔记本中的简单LSTM模型作为示例. 具体来说,如果我想使用变量范围控制重新分解上图中代码的LSTM部分,我可能会编写如下代码...

import tensorflow as tf
def initialize_LSTMcell(vocabulary_size, num_nodes, initializer):
    '''initialize LSTMcell weights and biases, set variables to reuse mode'''
    gates = ['input_gate', 'forget_gate', 'memory_cell', 'output_gate']
    with tf.variable_scope('LSTMcell') as scope:
        for gate in gates:
            with tf.variable_scope(gate) as gate_scope:
                wx = tf.get_variable("wx", [vocabulary_size, num_nodes], initializer)
                wt = tf.get_variable("wt", [num_nodes, num_nodes], initializer)
                bi = tf.get_variable("bi", [1, num_nodes, tf.constant_initializer(0.0)])
                gate_scope.reuse_variables() #this line can probably be omitted, b.z. by setting 'LSTMcell' scope variables to 'reuse' as the next line, it'll turn on the reuse mode for all its child scope variables
        scope.reuse_variables()

def get_scope_variables(scope_name, variable_names):
    '''a helper function to fetch variable based on scope_name and variable_name'''
    vars = {}
    with tf.variable_scope(scope_name, reuse=True):
        for var_name in variable_names
            var = tf.get_variable(var_name)
            vars[var_name] = var
    return vars

def LSTMcell(i, o, state):
    '''a function for performing LSTMcell computation'''
    gates = ['input_gate', 'forget_gate', 'memory_cell', 'output_gate']
    var_names = ['wx', 'wt', 'bi']
    gate_comp = {}
    with tf.variable_scope('LSTMcell', reuse=True):
        for gate in gates:
            vars = get_scope_variables(gate, var_names)
            gate_comp[gate] = tf.matmul(i, vars['wx']) + tf.matmul(o, vars['wt']) + vars['bi']
    state = tf.sigmoid(gate_comp['forget_gate']) * state + tf.sigmoid(gate_comp['input_gate']) * tf.tanh(gate_comp['memory_cell'])
    output = tf.sigmoid(gate_comp['output_gate']) * tf.tanh(state)
    return output, state

Run Code Online (Sandbox Code Playgroud)

重新分解的代码的使用将类似于以下内容......

initialize_LSTMcell(volcabulary_size, num_nodes, tf.truncated_normal_initializer(mean=-0.1, stddev=.01))
#...Doing some computation...
LSTMcell(input_tensor, output_tensor, state)

Run Code Online (Sandbox Code Playgroud)

尽管重构代码可能看起来不那么简单,但使用范围变量控制可确保范围封装并允许灵活的变量控制(至少在我看来).

在预训练一些LSTM并在后续模型中使用它们的权重.如何在不保存所有状态和恢复整个模型的情况下执行此操作.

假设你有一个预先训练好的模型冻结并加载,如果你想使用他们的冷冻'wx','wt'和'bi',你可以简单地找到他们的父范围名称和变量名称,然后使用类似的结构获取变量在get_scope_variables功能.

with tf.variable_scope(scope_name, reuse=True):
    var = tf.get_variable(var_name)

Run Code Online (Sandbox Code Playgroud)

以下是了解变量范围和共享变量的链接.我希望这是有帮助的.

Answer 2

All*_*oie 5

RNN模型将使用get_variable创建变量,您可以通过使用variable_scope包装创建这些变量的代码并将默认初始化程序传递给它来控制初始化.除非RNN明确指定一个(查看代码,否则),否则使用uniform_unit_scaling_initializer.

您还应该能够通过声明第二个模型并将reuse = True传递给其variable_scope来共享模型权重.只要名称空间匹配,新模型将获得与第一个模型相同的变量.

归档时间：	9 年，4 月前
查看次数：	13059 次
最近记录：	6 年，4 月前