如何在 LSTM/GRU 之上使用 keras 注意力层？

Question

如何在 LSTM/GRU 之上使用 keras 注意力层？

我想实现一个基于 LSTM 或 GRU 的编码器-解码器架构，并带有注意层。我看到 Keras 有一个层tensorflow.keras.layers.Attention，我想使用它（所有其他问题和资源似乎都自己实现它或使用第三方库）。另外，我没有使用网络进行序列到序列的转换，而是进行二进制分类，因此文档中提供的示例对我来说有点令人困惑。

我正在想象一个这样的模型。

import tensorflow as tf

x = tf.keras.Input((100, 50))

# encoder
hidden_states = tf.keras.layers.GRU(32, return_sequences=True)(x)

# decoder + attention
? = tf.keras.layers.Attention()([?, ?])
z = tf.keras.layers.GRU(32)(?)

# classification
z = tf.keras.layers.Dense(1, activation='sigmoid')(z)

model = tf.keras.Model(inputs=x, outputs=z)

Run Code Online (Sandbox Code Playgroud)

我不清楚该网络的解码器和注意力部分。我知道我需要根据编码器的隐藏状态和解码器当前的隐藏状态创建上下文向量。

我将如何实现该网络的解码器和注意力部分？

Answer 1

M.I*_*nat 10

这对您来说可能是一个迟到的答案，但为了将来的参考，我在这里根据您的想象模型提供了一个入门代码基础。目前，内置了三个注意力层，即

\n

- MultiHeadAttention layer\n- Attention layer (a.k.a. Luong-style attention)\n- AdditiveAttention layer (a.k.a. Bahdanau-style attention)\n

Run Code Online (Sandbox Code Playgroud)\n

对于起始代码，我们将在编码器部分使用 Luong 风格，在解码器部分使用 Bahdanau 风格的注意力机制。整体自动编码器架构将是

\n

a. encoder: input -> embedding -> gru -> luong-style-attn \n\nb. decoder: input -> lstm -> bahdanau-style-attn -> gap -> classifier\n              \xe2\x86\x93_____________________\n\n# whole model \nautoencoder: encoder + decoder \n

Run Code Online (Sandbox Code Playgroud)\n

让我们相应地构建模型。

\n

编码器

\n

from tensorflow.keras import Input, Model \nfrom tensorflow.keras.layers import * \nfrom tensorflow.keras import backend \nfrom tensorflow.keras import utils\nbackend.clear_session()\n\n# int sequences.\nenc_inputs = Input(shape=(20,), name=\'enc_inputs\')\n\n# Embedding lookup and GRU\nembedding = Embedding(input_dim=100, output_dim=64)(enc_inputs)\nwhole_sequence = GRU(4, return_sequences=True)(embedding)\n\n# Query-value attention of shape [batch_size, Tq, filters].\nquery_value_attention_seq = Attention()([whole_sequence, whole_sequence])\n\n# build encoder model \nencoder = Model(enc_inputs, query_value_attention_seq, name=\'encoder\')\n

Run Code Online (Sandbox Code Playgroud)\n

检查布局。

\n

utils.plot_model(encoder, show_shapes=True)\n

Run Code Online (Sandbox Code Playgroud)\n

\n

解码器

\n

# int sequences.\ndec_input = Input(shape=(20, 4), name=\'dec_inputs\')\n\n# LSTM\nwhole_sequence = LSTM(4, return_sequences=True)(dec_input)\n\n# Query-value attention of shape [batch_size, Tq, filters].\nquery_value_attention_seq = AdditiveAttention()([whole_sequence, dec_input])\n\n# Reduce over the sequence axis to produce encodings of shape\n# [batch_size, filters].\nquery_value_attention = GlobalAveragePooling1D()(query_value_attention_seq)\n\n# classification\ndec_output = Dense(1, activation=\'sigmoid\')(query_value_attention)\n\n# build decoder model\ndecoder = Model(dec_input, dec_output, name=\'decoder\')\n

Run Code Online (Sandbox Code Playgroud)\n

检查布局。

\n

自动编码器

\n

# encoder \nencoder_init = Input(shape=(20, ))\nencoder_output = encoder(encoder_init); print(encoder_output.shape)\n\n# decoder \ndecoder_output = decoder(encoder_output); print(decoder_output.shape)\n\n# bind all: autoencoder \nautoencoder = Model(encoder_init, decoder_output)\n\n# check layout \nutils.plot_model(autoencoder, show_shapes=True, expand_nested=True)\n

Run Code Online (Sandbox Code Playgroud)\n

\n

假人训练

\n

x_train = np.random.randint(0, 10, (100,20)); print(x_train.shape)\ny_train = np.random.randint(2, size=(100, 1)); print(y_train.shape)\n(100, 20)\n(100, 1)\n\nautoencoder.compile(\'adam\', \'binary_crossentropy\')\nautoencoder.fit(x_train, y_train, epochs=5, verbose=2)\n\nEpoch 1/5\n4/4 - 4s - loss: 0.6674\nEpoch 2/5\n4/4 - 0s - loss: 0.6637\nEpoch 3/5\n4/4 - 0s - loss: 0.6600\nEpoch 4/5\n4/4 - 0s - loss: 0.6590\nEpoch 5/5\n4/4 - 0s - loss: 0.6571\n

Run Code Online (Sandbox Code Playgroud)\n

\n

资源

\n

另外，你可以阅读我关于注意力机制的另一个答案。

\n

这是我最喜欢的关于多头变压器的视频，是3系列的视频。

\n

归档时间：	6 年，1 月前
查看次数：	4496 次
最近记录：	4 年，8 月前