`return_sequences = False` 等效于 pytorch LSTM

Question

`return_sequences = False` 等效于 pytorch LSTM

Zab*_*azi 4 nlp python-3.x lstm tensorflow pytorch

在 tensorflow/keras 中，我们可以简单地设置return_sequences = False分类/全连接/激活（softmax/sigmoid）层之前的最后一个 LSTM 层，以摆脱时间维度。

在 PyTorch 中，我没有找到类似的东西。对于分类任务，我不需要序列到序列模型，而是像这样的多对一架构：

这是我的简单双 LSTM 模型。

import torch
from torch import nn

class BiLSTMClassifier(nn.Module):
    def __init__(self):
        super(BiLSTMClassifier, self).__init__()
        self.embedding = torch.nn.Embedding(num_embeddings = 65000, embedding_dim = 64)
        self.bilstm = torch.nn.LSTM(input_size = 64, hidden_size = 8, num_layers = 2,
                                    batch_first = True, dropout = 0.2, bidirectional = True)
        # as we have 5 classes
        self.linear = nn.Linear(8*2*512, 5) # last dimension
    def forward(self, x):
        x = self.embedding(x)
        print(x.shape)
        x, _ = self.bilstm(x)
        print(x.shape)
        x = self.linear(x.reshape(x.shape[0], -1))
        print(x.shape)

# create our model

bilstmclassifier = BiLSTMClassifier()

Run Code Online (Sandbox Code Playgroud)

如果我观察每一层后的形状，

xx = torch.tensor(X_encoded[0]).reshape(1,512)
print(xx.shape) 
# torch.Size([1, 512])
bilstmclassifier(xx)
#torch.Size([1, 512, 64])
#torch.Size([1, 512, 16])
#torch.Size([1, 5])

Run Code Online (Sandbox Code Playgroud)

我该怎么做才能使最后一个 LSTM 返回一个形状(1, 16)而不是的张量(1, 512, 16)？

Answer 1

xdu*_*ch0 7

最简单的方法是索引张量：

x = x[:, -1, :]

Run Code Online (Sandbox Code Playgroud)

xRNN 输出在哪里。当然，如果batch_first是False，则必须使用x[-1, :, :]（或仅使用x[-1]）来索引时间轴。事实证明，这与 Tensorflow/Keras 所做的相同。相关代码可以在K.rnn 这里找到：

last_output = tuple(o[-1] for o in outputs)

Run Code Online (Sandbox Code Playgroud)

请注意，此时的代码使用time_major数据格式，因此索引在第一个轴上。此外，outputs是一个元组，因为它可以是多个层、状态/单元对等，但它通常是所有时间步长的输出序列。

然后在RNN类中使用它，如下所示：

if self.return_sequences:
    output = K.maybe_convert_to_ragged(is_ragged_input, outputs, row_lengths)
else:
    output = last_output

Run Code Online (Sandbox Code Playgroud)

所以总的来说，我们可以看到return_sequences=False只使用了outputs[-1].

太棒了，我在想类似的事情，但没有研究实现。感谢您的挖掘。 (2认同)

归档时间：	5 年，7 月前
查看次数：	1554 次
最近记录：	5 年，7 月前