如何增加 BERT 句子转换器嵌入的维度向量大小

Jun*_*ari 4 nlp artificial-intelligence machine-learning bert-language-model

我使用句子转换器进行语义搜索,但有时它不理解上下文含义并返回错误的结果,例如。意大利语上下文/语义搜索的 BERT 问题

默认情况下,句子嵌入的向量边是 78 列,那么如何增加该维度,以便它能够深入理解上下文含义。

代码:

# Load the BERT Model
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('bert-base-nli-mean-tokens')

# Setup a Corpus
# A corpus is a list with documents split by sentences.

sentences = ['Absence of sanity', 
             'Lack of saneness',
             'A man is eating food.',
             'A man is eating a piece of bread.',
             'The girl is carrying a baby.',
             'A man is riding a horse.',
             'A woman is playing violin.',
             'Two men pushed carts through the woods.',
             'A man is riding a white horse on an enclosed ground.',
             'A monkey is playing drums.',
             'A cheetah is running behind its prey.']

# Each sentence is encoded as a 1-D vector with 78 columns 
sentence_embeddings = model.encode(sentences) ### how to increase vector dimention 

print('Sample BERT embedding vector - length', len(sentence_embeddings[0]))

print('Sample BERT embedding vector - note includes negative values', sentence_embeddings[0])
Run Code Online (Sandbox Code Playgroud)

Ore*_*fon 5

不幸的是,以有意义的方式增加嵌入维度的唯一方法是重新训练模型。:(

然而,也许这不是您所需要的......也许您应该考虑微调模型:

我建议你看一下UKPLabs 的句子转换器。他们拥有针对 100 多种语言的句子嵌入的预训练模型。最好的部分是您可以微调这些模型。

祝你好运!