alv*_*vas 8 python lda gensim topic-modeling
lda.show_topics以下代码中的模块仅打印每个主题的前10个单词的分布,如何打印语料库中所有单词的完整分布?
from gensim import corpora, models
documents = ["Human machine interface for lab abc computer applications",
"A survey of user opinion of computer system response time",
"The EPS user interface management system",
"System and human system engineering testing of EPS",
"Relation of user perceived response time to error measurement",
"The generation of random binary unordered trees",
"The intersection graph of paths in trees",
"Graph minors IV Widths of trees and well quasi ordering",
"Graph minors A survey"]
stoplist = set('for a of the and to in'.split())
texts = [[word for word in document.lower().split() if word not in stoplist]
for document in documents]
dictionary = corpora.Dictionary(texts)
corpus = [dictionary.doc2bow(text) for text in texts]
lda = models.ldamodel.LdaModel(corpus_tfidf, id2word=dictionary, num_topics=2)
for i in lda.show_topics():
print i
Run Code Online (Sandbox Code Playgroud)
有一个变量调用,您可以topn在show_topics()其中指定每个主题上的单词分布所需的前N个单词的数量.请参阅http://radimrehurek.com/gensim/models/ldamodel.html
所以而不是默认lda.show_topics().您可以使用len(dictionary)每个主题的完整单词分布:
for i in lda.show_topics(topn=len(dictionary)):
print i
Run Code Online (Sandbox Code Playgroud)