相关疑难解决方法(0)

Spark Ml评估方法

我有一个火花数据框如下:

predictions.show(5)
+------+----+------+-----------+
|  user|item|rating| prediction|
+------+----+------+-----------+
|379433|  31|     1| 0.08203495|
|  1834|  31|     1|  0.4854447|
|422635|  31|     1|0.017672742|
|   839|  31|     1| 0.39273006|
| 51444|  31|     1| 0.09795039|
+------+----+------+-----------+
only showing top 5 rows

Run Code Online (Sandbox Code Playgroud)

预测是预测的评级,评级是隐含评级(计数).

现在我想检查我的推荐算法的AUC.

我首先尝试了pyspark.ml.BinaryClassificationEvaluator,因为它直接在数据框上工作.

# getting the evaluationa metric 

from pyspark.ml.evaluation import BinaryClassificationEvaluator

evaluator = BinaryClassificationEvaluator(rawPredictionCol="prediction")
print evaluator.evaluate(predictions)

Run Code Online (Sandbox Code Playgroud)

这给了我以下错误:

---------------------------------------------------------------------------
IllegalArgumentException                  Traceback (most recent call last)
<ipython-input-65-c642ea9c2cf5> in <module>()
      4 
      5 evaluator = BinaryClassificationEvaluator(rawPredictionCol="prediction")
----> 6 print evaluator.evaluate(predictions)
      7 
      8 #print evaluator.evaluate(predictions, {evaluator.metricName: "areaUnderPR"})

/Users/i854319/spark/python/pyspark/ml/evaluation.py in …

Run Code Online (Sandbox Code Playgroud)

python apache-spark pyspark apache-spark-ml apache-spark-mllib

Bak*_*war

2019 10-03

9
推荐指数

1
解决办法

2321
查看次数

Spark：测量 ALS 的性能

我正在使用 ALS 模型来spark.ml创建一个推荐系统，该系统使用特定项目集合的隐式反馈。我注意到模型的输出预测远低于 1，并且通常在 [0,0.1] 区间内。因此，在这种情况下使用 MAE 或 MSE 没有任何意义。

因此我使用 ROC 面积（AUC）来衡量性能。我通过使用 Spark 来做到这一点BinaryClassificationEvaluator，并且确实得到了接近 0.8 的值。但是，我无法清楚地理解这是如何可能的，因为大多数值的范围在 [0,0.1] 内。

据我了解，在某一点之后，评估者将考虑所有预测都属于 0 类。这本质上意味着 AUC 将等于负样本的百分比？

一般来说，如果您需要测试模型与逻辑回归相比的性能，您将如何处理如此低的值？

我按如下方式训练模型：

rank = 25
alpha = 1.0
numIterations = 10
als = ALS(rank=rank, maxIter=numIterations, alpha=alpha, userCol="id", itemCol="itemid", ratingCol="response", implicitPrefs=True, nonnegative=True)
als.setRegParam(0.01)
model = als.fit(train)

Run Code Online (Sandbox Code Playgroud)

machine-learning apache-spark pyspark

ml_*_*_0x

lucky-day

5
推荐指数

1
解决办法

2539
查看次数

在 Spark 中运行交叉验证估计器

所以我正在 Spark 中构建一个推荐系统。虽然我已经能够使用初始手动超参数值在数据集上评估和运行算法。我想通过让交叉验证估计器从超参数值网格中进行选择来自动化它。所以我为此写了以下函数

def recommendation(train):
    """ This function trains a collaborative filtering 
    algorithm on a ratings training data

    We use a Cross Validator and Grid Search to find the right hyper-parameter values



    Param: 
    train----> training data

    TUNING PARAMETERS: 
    alpha----> Alpha value to calculate the confidence matrix (only for implicit datasets)
    rank-----> no. of latent factors of the resulting X, Y matrix
    reg------> regularization parameter for penalising the X, Y factors


    Returns: 
    model-> ALS model object

    """


    from pyspark.ml.tuning import CrossValidator, …

Run Code Online (Sandbox Code Playgroud)

python apache-spark pyspark apache-spark-mllib

Bak*_*war

lucky-day

5
推荐指数

0
解决办法

1738
查看次数