将Pytorch LSTM的状态参数转换为Keras LSTM

Question

将Pytorch LSTM的状态参数转换为Keras LSTM

我试图将现有训练有素的PyTorch模型移植到Keras.

在移植过程中,我陷入了LSTM层.

LSTM网络的Keras实现似乎有三种状态矩阵,而Pytorch实现有四种.

例如,对于具有hidden_layers = 64的双向LSTM,input_size = 512&output size = 128状态参数,如下所示

Keras LSTM的状态参数

[<tf.Variable 'bidirectional_1/forward_lstm_1/kernel:0' shape=(512, 256) dtype=float32_ref>,
 <tf.Variable 'bidirectional_1/forward_lstm_1/recurrent_kernel:0' shape=(64, 256) dtype=float32_ref>,
 <tf.Variable 'bidirectional_1/forward_lstm_1/bias:0' shape=(256,) dtype=float32_ref>,
 <tf.Variable 'bidirectional_1/backward_lstm_1/kernel:0' shape=(512, 256) dtype=float32_ref>,
 <tf.Variable 'bidirectional_1/backward_lstm_1/recurrent_kernel:0' shape=(64, 256) dtype=float32_ref>,
 <tf.Variable 'bidirectional_1/backward_lstm_1/bias:0' shape=(256,) dtype=float32_ref>]

Run Code Online (Sandbox Code Playgroud)

PyTorch LSTM的状态参数

 ['rnn.0.rnn.weight_ih_l0', torch.Size([256, 512])],
 ['rnn.0.rnn.weight_hh_l0', torch.Size([256, 64])],
 ['rnn.0.rnn.bias_ih_l0', torch.Size([256])],
 ['rnn.0.rnn.bias_hh_l0', torch.Size([256])],
 ['rnn.0.rnn.weight_ih_l0_reverse', torch.Size([256, 512])],
 ['rnn.0.rnn.weight_hh_l0_reverse', torch.Size([256, 64])],
 ['rnn.0.rnn.bias_ih_l0_reverse', torch.Size([256])],
 ['rnn.0.rnn.bias_hh_l0_reverse', torch.Size([256])],

Run Code Online (Sandbox Code Playgroud)

我试着查看两个实现的代码,但不能理解太多.

有人可以帮我把PyTorch的4组状态参数转换成Keras的3组状态参数

Answer 1

Yu-*_*ang 8

他们真的没那么不同.如果你在PyTorch中总结两个偏置向量,那么方程将与Keras中实现的方程相同.

这是PyTorch文档中的LSTM公式:

PyTorch使用两个单独的偏置向量进行输入转换(以下标开头i)和循环转换(以下标开头h).

在Keras LSTMCell:

        x_i = K.dot(inputs_i, self.kernel_i)
        x_f = K.dot(inputs_f, self.kernel_f)
        x_c = K.dot(inputs_c, self.kernel_c)
        x_o = K.dot(inputs_o, self.kernel_o)
        if self.use_bias:
            x_i = K.bias_add(x_i, self.bias_i)
            x_f = K.bias_add(x_f, self.bias_f)
            x_c = K.bias_add(x_c, self.bias_c)
            x_o = K.bias_add(x_o, self.bias_o)

        if 0 < self.recurrent_dropout < 1.:
            h_tm1_i = h_tm1 * rec_dp_mask[0]
            h_tm1_f = h_tm1 * rec_dp_mask[1]
            h_tm1_c = h_tm1 * rec_dp_mask[2]
            h_tm1_o = h_tm1 * rec_dp_mask[3]
        else:
            h_tm1_i = h_tm1
            h_tm1_f = h_tm1
            h_tm1_c = h_tm1
            h_tm1_o = h_tm1
        i = self.recurrent_activation(x_i + K.dot(h_tm1_i,
                                                  self.recurrent_kernel_i))
        f = self.recurrent_activation(x_f + K.dot(h_tm1_f,
                                                  self.recurrent_kernel_f))
        c = f * c_tm1 + i * self.activation(x_c + K.dot(h_tm1_c,
                                                        self.recurrent_kernel_c))
        o = self.recurrent_activation(x_o + K.dot(h_tm1_o,
                                                  self.recurrent_kernel_o))

Run Code Online (Sandbox Code Playgroud)

输入转换中只添加了一个偏差.但是,如果我们总结PyTorch中的两个偏差,则方程式将是等价的.

双偏置LSTM是在cuDNN中实现的(参见开发人员指南).我对PyTorch并不熟悉,但我想这就是为什么他们使用两个偏置参数.在Keras中,该CuDNNLSTM层还具有两个偏置权重向量.

谢谢.我验证了它你是对的.两个输出之间只有很小的差异(按1E-7的顺序).另外在检查时我发现,Pytorch使用**sigmoid**作为激活函数,而Keras默认使用**hard_sigmoid** (2认同)
更新：我将我的OCR引擎的[Pytorch模型]（https://github.com/harish2704/pottan-ocr/blob/pre-alpha/pottan_ocr/model.py）移植到了[Keras]（https：// github.com/harish2704/pottan-ocr/blob/pre-alpha/misc/keras_model.py）并成功[在网络浏览器中运行经过训练的模型]（http://harish2704.github.io/pottan-demo/）使用[Keras-js]（https://github.com/transcranial/keras-js） (2认同)

归档时间：	8 年，1 月前
查看次数：	972 次
最近记录：	8 年，1 月前