相关疑难解决方法(0)

Spark MLlib LDA,如何推断一个新的看不见的文件的主题分布？

我有兴趣使用Spark MLlib应用LDA主题建模.我已经检查了这里的代码和解释,但我找不到如何使用模型然后在一个新的看不见的文档中找到主题分布.

lda topic-modeling apache-spark apache-spark-mllib

Ram*_*ami

2016 04-25

14
推荐指数

1
解决办法

3964
查看次数

Spark中的潜在Dirichlet分配(LDA)

我正在尝试在Spark中编写一个progor来执行Latent Dirichlet分配(LDA).此Spark文档页面提供了一个很好的示例,用于在示例数据上执行LDA.以下是该计划

from pyspark.mllib.clustering import LDA, LDAModel
from pyspark.mllib.linalg import Vectors

# Load and parse the data
data = sc.textFile("data/mllib/sample_lda_data.txt")
parsedData = data.map(lambda line: Vectors.dense([float(x) for x in line.strip().split(' ')]))
# Index documents with unique IDs
corpus = parsedData.zipWithIndex().map(lambda x: [x[1], x[0]]).cache()

# Cluster the documents into three topics using LDA
ldaModel = LDA.train(corpus, k=3)

# Output topics. Each is a distribution over words (matching word count vectors)
print("Learned topics (as distributions over vocab of " + str(ldaModel.vocabSize()) …

Run Code Online (Sandbox Code Playgroud)

python lda pyspark

pra*_*nth

lucky-day

9
推荐指数

1
解决办法

5439
查看次数