了解使用“gensim.corpora.Dictionary(TEXT)”后单词如何存储在 gensim 语料库字典中

San*_*ket 3 python corpus gensim

将文本文档列表转换为语料库词典,然后使用以下方法将其转换为词袋模型:

dictionary = gensim.corpora.Dictionary(docs) # docs is a list of text documents
corpus = [dictionary.doc2bow(doc) for doc in docs]
Run Code Online (Sandbox Code Playgroud)

我们可以使用以下方法找出字典中特定单词的索引值:

dictionary.doc2idx(["righteous","height"])
Run Code Online (Sandbox Code Playgroud)

有没有办法找到存储在字典中特定索引处的单词?

ane*_*shi 5

长话短说:

dictionary.get(index_of_word)

例子:

import gensim

docs=[['hello', 'world'],['i','am', 'groot']]

dictionary = gensim.corpora.Dictionary(docs) # docs is a list of text documents
corpus = [dictionary.doc2bow(doc) for doc in docs]

print(dictionary.get(0))
print(dictionary.get(3))
Run Code Online (Sandbox Code Playgroud)

输出:

hello
groot
Run Code Online (Sandbox Code Playgroud)

希望有帮助!