相关疑难解决方法(0)

在Gensim中理解LDA转化语料库

我试图检查BOW语料库与LDA [BOW语料库]的内容(由在该语料库上训练的LDA模型转换,例如35个主题)我发现了以下输出:

DOC 1 : [(1522, 1), (2028, 1), (2082, 1), (6202, 1)]  
LDA 1 : [(29, 0.80571428571428572)]  
DOC 2 : [(1522, 1), (5364, 1), (6202, 1), (6661, 1), (6983, 1)]  
LDA 2 : [(29, 0.83809523809523812)]  
DOC 3 : [(3079, 1), (3395, 1), (4874, 1)]  
LDA 3 : [(34, 0.75714285714285712)]  
DOC 4 : [(1482, 1), (2806, 1), (3988, 1)]  
LDA 4 : [(22, 0.50714288283121989), (32, 0.25714283145449457)]  
DOC 5 : [(440, 1), (533, 1), (1264, 1), (2433, 1), (3012, 1), (3902, …
Run Code Online (Sandbox Code Playgroud)

python nlp lda gensim

4
推荐指数
1
解决办法
1774
查看次数

Python gensim LDA:获取主题后将主题添加到文档中

我正在使用 gensim 的 LDA 来执行主题建模。我知道如何将原始文本数据转换为语料库并获取主题。但是,在获得主题后,我可以将主题结果标记或添加回原始文档吗?

这是我的代码:

movie_reviews = pd.read_csv(data_path + 'movie_review.tsv',header=0,delimiter='\t',quoting=3) 

reviews = []
for i in range(len(movie_reviews['review'])):
reviews.append(review_to_words(movie_reviews['review']
              [i],stops=stopwords.words('english')))
from gensim import corpora
dictionary = corpora.Dictionary(reviews)
corpus = [dictionary.doc2bow(review) for review in reviews]
from gensim import models
tfidf = models.TfidfModel(corpus)
corpus_tfidf = tfidf[corpus]
lda = models.LdaModel(corpus_tfidf, id2word=dictionary, num_topics=10)
corpus_lda = lda[corpus_tfidf]
lda.print_topics(10)
[(0,
  u'0.001*ben + 0.001*sinatra + 0.001*santa + 0.001*henry + 0.001*band + 0.001*william + 0.001*fool + 0.001*tragic + 0.001*favourite + 0.001*bed'),
 (1,
  u'0.002*dentist + 0.002*homeless + 0.002*connery + …
Run Code Online (Sandbox Code Playgroud)

python lda gensim

4
推荐指数
1
解决办法
2384
查看次数

标签 统计

gensim ×2

lda ×2

python ×2

nlp ×1