gensimwv.most_similar返回语音上接近的单词(相似的声音),而不是语义上相似的单词。这是正常的吗?为什么会发生这种情况?
以下是有关的文档most_similar:https://radimrehurek.com/gensim/models/keyedvectors.html#gensim.models.keyedvectors.WordEmbeddingsKeyedVectors.most_similar
In [144]: len(vectors.vocab)
Out[144]: 32966
...
In [140]: vectors.most_similar('fight')
Out[140]:
[('Night', 0.9940935373306274),
('knight', 0.9928507804870605),
('fright', 0.9925899505615234),
('light', 0.9919329285621643),
('bright', 0.9914385080337524),
('plight', 0.9912853240966797),
('Eight', 0.9912533760070801),
('sight', 0.9908033013343811),
('playwright', 0.9905624985694885),
('slight', 0.990411102771759)]
In [141]: vectors.most_similar('care')
Out[141]:
[('spare', 0.9710584878921509),
('scare', 0.9626247882843018),
('share', 0.9594929218292236),
('prepare', 0.9584596157073975),
('aware', 0.9551078081130981),
('negare', 0.9550014138221741),
('glassware', 0.9507938027381897),
('Welfare', 0.9489598274230957),
('warfare', 0.9487678408622742),
('square', 0.9473209381103516)]
Run Code Online (Sandbox Code Playgroud)
训练数据包含学术论文,这是我的训练脚本:
from gensim.models.fasttext import FastText as FT_gensim
import gensim.models.keyedvectors as word2vec
dim_size = 300
epochs = 10
model = …Run Code Online (Sandbox Code Playgroud)