Md.*_*rif 8 python nlp gensim word2vec word-embedding
I have trained a Word2Vec model using Gensim 3.8.0. Later I tried to use the pretrained model using Gensim 4.0.o on GCP. I used the following code:
model = KeyedVectors.load_word2vec_format(wv_path, binary= False)
words = model.wv.vocab.keys()
self.word2vec = {word:model.wv[word]%EMBEDDING_DIM for word in words}
Run Code Online (Sandbox Code Playgroud)
I was getting error that "model.mv" has been removed from Gensim 4.0.0. Then I used the following code:
model = KeyedVectors.load_word2vec_format(wv_path, binary= False)
words = model.vocab.keys()
word2vec = {word:model[word]%EMBEDDING_DIM for word in words}
Run Code Online (Sandbox Code Playgroud)
And getting the following error:
AttributeError: The vocab attribute was removed from KeyedVector in Gensim 4.0.0.
Use KeyedVector's .key_to_index dict, .index_to_key list, and methods .get_vecattr(key, attr) and .set_vecattr(key, attr, new_val) instead.
See https://github.com/RaRe-Technologies/gensim/wiki/Migrating-from-Gensim-3.x-to-4
Run Code Online (Sandbox Code Playgroud)
Can anyone please suggest that how can I use the pretrained model & return a dictionary in Gensim 4.0.0?
迁移说明解释了主要更改以及如何调整您的代码:
https://github.com/RaRe-Technologies/gensim/wiki/Migration-from-Gensim-3.x-to-4
根据那里的指导,要获取单词列表,因为您的model变量已经是 的实例KeyedVectors,您可以使用:
model.index_to_key
Run Code Online (Sandbox Code Playgroud)
您的代码并不显示需要字典,但 中的单词到索引位置字典略有不同model.key_to_index。但是,您可以model[key]像以前一样使用来获取单个向量。
(另外:我无法想象您%EMBEDDING_DIM正在做任何有用的事情。为什么您要%使用维度的整数计数来针对通常是小浮点数的各个维度执行元素模运算?它通常是无害的,因为EMBEDDING_DIM通常会远远大于单个值,但它没有任何好的目的。)
小智 9
从Gensim 3.x迁移到4引起的变化都在github链接里:
https://github.com/RaRe-Technologies/gensim/wiki/Migrating-from-Gensim-3.x-to-4
对于上述问题,对我有用的解决方案:
words = list(model.wv.index_to_key)
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
8876 次 |
| 最近记录: |