小编Alf*_*nry的帖子

Word2Vec 词汇结果仅包含字母和符号

我是 Word2Vec 的新手,我正在尝试根据单词的相似性对单词进行聚类。首先,我使用 nltk 来分隔句子,然后使用生成的句子列表作为 Word2Vec 的输入。然而,当我打印词汇时,它只是一堆字母、数字和符号,而不是单词。具体来说,其中一个字母的示例是“< gensim.models.keyedvectors.Vocab object at 0x00000238145AB438>, 'L':”

# imports needed and logging
import gensim
from gensim.models import word2vec
import logging

import nltk
#nltk.download('punkt')
#nltk.download('averaged_perceptron_tagger')
with open('C:\\Users\\Freddy\\Desktop\\Thesis\\Descriptions.txt','r') as f_open:
    text = f_open.read()
arr = []

sentences = nltk.sent_tokenize(text) # this gives a list of sentences

logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s',level=logging.INFO)

model = word2vec.Word2Vec(sentences, size = 300)

print(model.wv.vocab)
Run Code Online (Sandbox Code Playgroud)

python tokenize python-3.x gensim word2vec

2
推荐指数
1
解决办法
2524
查看次数

标签 统计

gensim ×1

python ×1

python-3.x ×1

tokenize ×1

word2vec ×1