我正在使用 gensim 的 LDA 来执行主题建模。我知道如何将原始文本数据转换为语料库并获取主题。但是,在获得主题后,我可以将主题结果标记或添加回原始文档吗?
这是我的代码:
movie_reviews = pd.read_csv(data_path + 'movie_review.tsv',header=0,delimiter='\t',quoting=3)
reviews = []
for i in range(len(movie_reviews['review'])):
reviews.append(review_to_words(movie_reviews['review']
[i],stops=stopwords.words('english')))
from gensim import corpora
dictionary = corpora.Dictionary(reviews)
corpus = [dictionary.doc2bow(review) for review in reviews]
from gensim import models
tfidf = models.TfidfModel(corpus)
corpus_tfidf = tfidf[corpus]
lda = models.LdaModel(corpus_tfidf, id2word=dictionary, num_topics=10)
corpus_lda = lda[corpus_tfidf]
lda.print_topics(10)
[(0,
u'0.001*ben + 0.001*sinatra + 0.001*santa + 0.001*henry + 0.001*band + 0.001*william + 0.001*fool + 0.001*tragic + 0.001*favourite + 0.001*bed'),
(1,
u'0.002*dentist + 0.002*homeless + 0.002*connery + …Run Code Online (Sandbox Code Playgroud)