如何在Pyspark的LogisticRegressionWithLBFGS中打印预测概率

Hem*_*iah 1 machine-learning logistic-regression apache-spark pyspark apache-spark-mllib

我正在使用Spark 1.5.1和In pyspark,之后我使用以下模型拟合模型:

model = LogisticRegressionWithLBFGS.train(parsedData)
Run Code Online (Sandbox Code Playgroud)

我可以使用以下方式打印预测:

model.predict(p.features)
Run Code Online (Sandbox Code Playgroud)

是否有功能打印概率分数以及预测?

des*_*aut 7

您必须先清除阈值,这仅适用于二进制分类:

 from pyspark.mllib.classification import LogisticRegressionWithLBFGS, LogisticRegressionModel
 from pyspark.mllib.regression import LabeledPoint

 parsed_data = [LabeledPoint(0.0, [4.6,3.6,1.0,0.2]),
                LabeledPoint(0.0, [5.7,4.4,1.5,0.4]),
                LabeledPoint(1.0, [6.7,3.1,4.4,1.4]),
                LabeledPoint(0.0, [4.8,3.4,1.6,0.2]),
                LabeledPoint(1.0, [4.4,3.2,1.3,0.2])]   

 model = LogisticRegressionWithLBFGS.train(sc.parallelize(parsed_data))
 model.threshold
 # 0.5
 model.predict(parsed_data[2].features)
 # 1

 model.clearThreshold()
 model.predict(parsed_data[2].features)
 # 0.9873840020002339
Run Code Online (Sandbox Code Playgroud)