mah*_*h65 2 machine-learning pyspark apache-spark-ml multiclass-classification
我已经训练了一个模型,想要计算几个重要的指标,例如accuracy、、 和。precisionrecallf1 score
我遵循的过程是:
from pyspark.ml.classification import LogisticRegression
lr = LogisticRegression(featuresCol='features',labelCol='label')
lrModel = lr.fit(train)
lrPredictions = lrModel.transform(test)
from pyspark.ml.evaluation import MulticlassClassificationEvaluator
from pyspark.ml.evaluation import BinaryClassificationEvaluator
eval_accuracy = MulticlassClassificationEvaluator(labelCol="label", predictionCol="prediction", metricName="accuracy")
eval_precision = MulticlassClassificationEvaluator(labelCol="label", predictionCol="prediction", metricName="precision")
eval_recall = MulticlassClassificationEvaluator(labelCol="label", predictionCol="prediction", metricName="recall")
eval_f1 = MulticlassClassificationEvaluator(labelCol="label", predictionCol="prediction", metricName="f1Measure")
eval_auc = BinaryClassificationEvaluator(labelCol="label", rawPredictionCol="prediction")
accuracy = eval_accuracy.evaluate(lrPredictions)
precision = eval_precision.evaluate(lrPredictions)
recall = eval_recall.evaluate(lrPredictions)
f1score = eval_f1.evaluate(lrPredictions)
auc = eval_accuracy.evaluate(lrPredictions)
Run Code Online (Sandbox Code Playgroud)
然而,它只能计算accuracy和auc,而不能计算其他三个。这里我应该修改什么?
根据文档,对于 F1 度量、精度和召回率, 的相关参数MulticlassClassificationEvaluator应分别为
metricName="f1"
metricName="precisionByLabel"
metricName="recallByLabel"
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
8074 次 |
| 最近记录: |