如何在keras中绑定单词嵌入和softmax权重？

Question

如何在keras中绑定单词嵌入和softmax权重？

Roh*_*pta 5 nlp machine-learning neural-network deep-learning keras

它对于NLP和视觉语言问题中的各种神经网络架构来说是常见的,它将初始字嵌入层的权重与输出softmax的权重联系起来.通常这会提高句子生成质量.(见这里的例子)

在Keras中,使用Embedding类嵌入字嵌入层是典型的,但似乎没有简单的方法将该层的权重与输出softmax联系起来.有人会碰巧知道如何实施吗？

Answer 1

意识到 Press and Wolf不建议将砝码冻结到一些经过预训练的砝码上，而应将其绑紧。这就是说，要确保训练期间输入和输出权重始终相同（就同步而言）。

在典型的NLP模型（例如语言建模/翻译）中，您的输入维度（词汇）为size V且隐藏的表示形式为size H。然后，从Embedding图层开始，即矩阵VxH。输出层（可能）Dense(V, activation='softmax')是矩阵H2xV。系权重时，我们希望这些矩阵相同（因此，H==H2）。对于在Keras中执行此操作，我认为方法是通过共享层：

在模型中，您需要实例化一个共享的嵌入层（尺寸为VxH），并将其应用于输入和输出。但是您需要对其进行转置，以具有所需的输出尺寸（HxV）。因此，我们声明了一个TiedEmbeddingsTransposed图层，该图层转置了给定图层的嵌入矩阵（并应用了激活函数）：

class TiedEmbeddingsTransposed(Layer):
    """Layer for tying embeddings in an output layer.
    A regular embedding layer has the shape: V x H (V: size of the vocabulary. H: size of the projected space).
    In this layer, we'll go: H x V.
    With the same weights than the regular embedding.
    In addition, it may have an activation.
    # References
        - [ Using the Output Embedding to Improve Language Models](https://arxiv.org/abs/1608.05859)
    """

    def __init__(self, tied_to=None,
                 activation=None,
                 **kwargs):
        super(TiedEmbeddingsTransposed, self).__init__(**kwargs)
        self.tied_to = tied_to
        self.activation = activations.get(activation)

    def build(self, input_shape):
        self.transposed_weights = K.transpose(self.tied_to.weights[0])
        self.built = True

    def compute_mask(self, inputs, mask=None):
        return mask

    def compute_output_shape(self, input_shape):
        return input_shape[0], K.int_shape(self.tied_to.weights[0])[0]

    def call(self, inputs, mask=None):
        output = K.dot(inputs, self.transposed_weights)
        if self.activation is not None:
            output = self.activation(output)
        return output


    def get_config(self):
        config = {'activation': activations.serialize(self.activation)
                  }
        base_config = super(TiedEmbeddingsTransposed, self).get_config()
        return dict(list(base_config.items()) + list(config.items()))

Run Code Online (Sandbox Code Playgroud)

该层的用法是：

# Declare the shared embedding layer
shared_embedding_layer = Embedding(V, H)
# Obtain word embeddings
word_embedding = shared_embedding_layer(input)
# Do stuff with your model
# Compute output (e.g. a vocabulary-size probability vector) with the shared layer:
output = TimeDistributed(TiedEmbeddingsTransposed(tied_to=shared_embedding_layer, activation='softmax')(intermediate_rep)

Run Code Online (Sandbox Code Playgroud)

我已经在NMT-Keras上对此进行了测试，并且可以正确训练。但是，当我尝试加载经过训练的模型时，它会收到与Keras加载模型的方式有关的错误：它不会从中加载权重tied_to。我发现这方面（的几个问题1，2，3），但我还没有设法解决这个问题。如果有人对接下来的步骤有任何想法，我很高兴听到他们的意见:)

Answer 2

Mar*_*jko 0

正如您可能在此处阅读的那样，您应该简单地将trainable标志设置为False。例如

aux_output = Embedding(..., trainable=False)(input)
....
output = Dense(nb_of_classes, .. ,activation='softmax', trainable=False)

Run Code Online (Sandbox Code Playgroud)

但提问者只是希望输入和输出嵌入绑定在一起，以便它们在训练期间保持不变。`trainable=False` 对此没有帮助，因为这意味着嵌入永远固定。是的，它们是相同的，但提问者也想学习嵌入，这意味着“可训练”必须是“True”。 (8认同)

归档时间：	8 年，3 月前
查看次数：	1556 次
最近记录：	7 年，8 月前