小编use*_*714的帖子

Spark:回归模型阈值和精度

我有逻辑回归模式,我明确地将阈值设置为0.5.

model.setThreshold(0.5)
Run Code Online (Sandbox Code Playgroud)

我训练模型然后我想得到基本的统计数据 - 精确度,召回等.

这是我在评估模型时所做的事情:

val metrics = new BinaryClassificationMetrics(predictionAndLabels)

val precision = metrics.precisionByThreshold


precision.foreach { case (t, p) =>

      println(s"Threshold is: $t, Precision is: $p")

    }
Run Code Online (Sandbox Code Playgroud)

我得到的结果只有0.0和1.0作为阈值,0.5完全被忽略.

以下是上述循环的输出:

阈值为:1.0,精度为:0.8571428571428571

阈值为:0.0,精度为:0.3005181347150259

当我调用metrics.thresholds()时,它也只返回两个值,0.0和1.0.

如何获得阈值为0.5的精度和召回值?

apache-spark apache-spark-mllib

4
推荐指数
1
解决办法
2409
查看次数

Python:执行shell命令

我需要这样做:

paste file1 file2 file3 > result
Run Code Online (Sandbox Code Playgroud)

我的python脚本中有以下内容:

from subprocess import call

// other code here.

// Here is how I call the shell command

call ["paste", "file1", "file2", "file3", ">", "result"])
Run Code Online (Sandbox Code Playgroud)

不幸的是我收到此错误:

paste: >: No such file or directory.

任何帮助都会很棒!

python subprocess

1
推荐指数
2
解决办法
387
查看次数

在成对的rdd中生成rdd唯一值

我有一个此数据类型的Spark RDD:RDD [(Int,Array [Int])])

该RDD的样本值为:

100,数组(1,2,3,4,5)

200,数组(1,2,50,20)

300,数组(30,2,400,1)

我想在此RDD的所有Array元素中获取所有唯一值,我不在乎键,只想获取所有唯一值。因此,以上示例的结果为(1,2,3,4,5,20,30,50,400)。

什么是这样做的有效方法。

scala apache-spark

1
推荐指数
1
解决办法
2164
查看次数

scala sortWith:负数没有被排序

val reslist = List(200.0,-100.00,50.80,-400.83, 800.003,-6.513214114672146E85, -1.2425461624057028E86, -4.7624471630469706E86, -3.6046499228286203E86, 0.0, -8.833653923554989E85, 0.000, -4.795843631190487E85, -5.34142100270833E86, -3.48087737474366E85, -2.811146396971388E86, -6.923235225460886E86, -6.513214114672146E85, 0.00000, -1.2425461624057028E86, -7.073704018243951E85, -9.633244016491059E86, -1.1418901590222212E86, -2.115257701350766E86, -1.1418901590222212E86, -3.48087737474366E85,-1.0676381955303372E86,500.56, 2.900556,400.56,-48956.00,4509.0005); 

val weightlistzi = reslist.zipWithIndex 
// List((200.0,0), (-100.0,1), (50.8,2), (-400.83,3), (800.003,4), (-6.513214114672146E85,5), (-1.2425461624057028E86,6), (-4.7624471630469706E86,7), (-3.6046499228286203E86,8), (0.0,9), (-8.833653923554989E85,10), (0.0,11), (-4.795843631190487E85,12), (-5.34142100270833E86,13), (-3.48087737474366E85,14), (-2.811146396971388E86,15), (-6.923235225460886E86,16), (-6.513214114672146E85,17), (0.0,18), (-1.2425461624057028E86,19), (-7.073704018243951E85,20), (-9.633244016491059E86,21), (-1.1418901590222212E86,22), (-2.115257701350766E86,23), (-1.1418901590222212E86,24), (-3.48087737474366E85,25), (-1.0676381955303372E86,26), (500.56,27), (2.900556,28), (400.56,29), (-48956.0,30), (4509.0005,31)) 

// I am sorting it here.
val resultlist = weightlistzi.sortWith { (x: (Double,Int), y: (Double,Int)) => x._1 …
Run Code Online (Sandbox Code Playgroud)

scala

-1
推荐指数
1
解决办法
112
查看次数