在 Keras 嵌入层中使用 BERT 嵌入

Question

在 Keras 嵌入层中使用 BERT 嵌入

Amb*_*kar 6 nlp embedding python-3.x keras bert-language-model

我想在 LSTM 的嵌入层中使用 BERT 词向量嵌入，而不是通常的默认嵌入层。我有什么办法可以做到吗？

Answer 1

希望这些链接有帮助：

带嵌入的 Tf2.0（TPU 训练）的 Huggingface 变压器：( https://www.kaggle.com/abhilash1910/nlp-workshop-2-ml-india )
与 BERT 嵌入的上下文相似度（Pytorch）： https://github.com/abhilash1910/BERTSimilarity
为了使用 BERT/BERT 变体生成独特的句子嵌入，建议选择正确的层。在某些情况下，可以考虑以下模式来确定嵌入（TF 2.0/Keras）：

transformer_model = transformers.TFBertModel.from_pretrained('bert-large-uncased')

input_ids = tf.keras.layers.Input(shape=(128,), name='input_token', dtype='int32')
input_masks_ids = tf.keras.layers.Input(shape=(128,), name='masked_token', dtype='int32')
X = transformer_model(input_ids, input_masks_ids)[0]
X = tf.keras.layers.Dropout(0.2)(X)
X = tf.keras.layers.Dense(6, activation='softmax')
model = tf.keras.Model(inputs=[input_ids, input_masks_ids], outputs = X)(X)

Run Code Online (Sandbox Code Playgroud)

如果这不起作用，请参考huggingface存储库中的“特征提取”来获取嵌入。（https://huggingface.co/transformers/main_classes/pipelines.html）提供了一个示例：

import numpy as np
from transformers import AutoTokenizer, pipeline, TFDistilBertModel
from scipy.spatial.distance import cosine
def transformer_embedding(name,inp,model_name):

    model = model_name.from_pretrained(name)
    tokenizer = AutoTokenizer.from_pretrained(name)
    pipe = pipeline('feature-extraction', model=model, 
                tokenizer=tokenizer)
    features = pipe(inp)
    features = np.squeeze(features)
    return features
z=['The brown fox jumped over the dog','The ship sank in the Atlantic Ocean']
embedding_features1=transformer_embedding('distilbert-base-uncased',z[0],TFDistilBertModel)
embedding_features2=transformer_embedding('distilbert-base-uncased',z[1],TFDistilBertModel)
distance=1-cosine(embedding_features1[0],embedding_features2[0])
print(distance)

Run Code Online (Sandbox Code Playgroud)

谢谢。

归档时间：	5 年，3 月前
查看次数：	8749 次
最近记录：	3 年，7 月前