如何解释Spark Logistic回归预测中的概率列？

Question

如何解释Spark Logistic回归预测中的概率列？

Wal*_*Cat 3 machine-learning logistic-regression apache-spark apache-spark-sql apache-spark-ml

我正在通过预测spark.ml.classification.LogisticRegressionModel.predict。许多行的prediction列为1.0，probability列为.04。该model.getThreshold是0.5这样我会假设模型进行分类一切在0.5概率阈值1.0。

我应该如何解释1.0 prediction和probability0.04的结果？

Answer 1

Sha*_*ica 6

执行a的概率列LogisticRegression应包含一个与类别数相同长度的列表，其中每个索引给出该类别的相应概率。我用两个类作了一个小例子来说明：

case class Person(label: Double, age: Double, height: Double, weight: Double)
val df = List(Person(0.0, 15, 175, 67), 
      Person(0.0, 30, 190, 100), 
      Person(1.0, 40, 155, 57), 
      Person(1.0, 50, 160, 56), 
      Person(0.0, 15, 170, 56), 
      Person(1.0, 80, 180, 88)).toDF()

val assembler = new VectorAssembler().setInputCols(Array("age", "height", "weight"))
  .setOutputCol("features")
  .select("label", "features")
val df2 = assembler.transform(df)
df2.show

+-----+------------------+
|label|          features|
+-----+------------------+
|  0.0| [15.0,175.0,67.0]|
|  0.0|[30.0,190.0,100.0]|
|  1.0| [40.0,155.0,57.0]|
|  1.0| [50.0,160.0,56.0]|
|  0.0| [15.0,170.0,56.0]|
|  1.0| [80.0,180.0,88.0]|
+-----+------------------+

val lr = new LogisticRegression().setMaxIter(10).setRegParam(0.3).setElasticNetParam(0.8)
val Array(testing, training) = df2.randomSplit(Array(0.7, 0.3))

val model = lr.fit(training)
val predictions = model.transform(testing)
predictions.select("probability", "prediction").show(false)


+----------------------------------------+----------+
|probability                             |prediction|
+----------------------------------------+----------+
|[0.7487950501224138,0.2512049498775863] |0.0       |
|[0.6458452667523259,0.35415473324767416]|0.0       |
|[0.3888393314864866,0.6111606685135134] |1.0       |
+----------------------------------------+----------+

Run Code Online (Sandbox Code Playgroud)

这是算法的概率以及最终预测。最终具有最高概率的类别是预测的类别。

嗨沙多。我们如何将每个概率与其对应的类别相关联？正如我们在这里看到的， prob[0] 实际上来自类 0 ， prob[1] 来自类 1 。哪里说 prob[0] 不对应于类 1 ？ (3认同)

归档时间：	8 年，9 月前
查看次数：	1884 次
最近记录：	6 年，8 月前