没有MAX_LENGTH的AttentionDecoderRNN

alv*_*vas 4 machine-translation recurrent-neural-network attention-model sequence-to-sequence pytorch

从PyTorch Seq2Seq教程,http: //pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html#attention-decoder

我们看到注意机制严重依赖于MAX_LENGTH参数来确定输出维数attn -> attn_softmax -> attn_weights,即

class AttnDecoderRNN(nn.Module):
    def __init__(self, hidden_size, output_size, dropout_p=0.1, max_length=MAX_LENGTH):
        super(AttnDecoderRNN, self).__init__()
        self.hidden_size = hidden_size
        self.output_size = output_size
        self.dropout_p = dropout_p
        self.max_length = max_length

        self.embedding = nn.Embedding(self.output_size, self.hidden_size)
        self.attn = nn.Linear(self.hidden_size * 2, self.max_length)
        self.attn_combine = nn.Linear(self.hidden_size * 2, self.hidden_size)
        self.dropout = nn.Dropout(self.dropout_p)
        self.gru = nn.GRU(self.hidden_size, self.hidden_size)
        self.out = nn.Linear(self.hidden_size, self.output_size)
Run Code Online (Sandbox Code Playgroud)

进一步来说

self.attn = nn.Linear(self.hidden_size * 2, self.max_length)
Run Code Online (Sandbox Code Playgroud)

我理解MAX_LENGTH变量是减少no的机制.需要训练的参数AttentionDecoderRNN.

如果我们没有MAX_LENGTH预先决定.我们应该用什么值初始化attn图层?

会是output_size吗?如果是这样,那么就会学习目标语言中完整词汇的注意力.这不是Bahdanau(2015)关注论文的真实意图吗?

pat*_*_ai 5

注意调制解码器的输入.这是注意调制编码序列,其与输入序列具有相同的长度.因此,MAX_LENGTH应该是所有输入序列的最大序列长度.