为什么这个Spark示例代码不会加载到spark-shell中?

Dea*_*lze 7 scala apache-spark

下面的示例代码来自Advanced Analytics with Spark一书.当我将它加载到spark-shell(版本1.4.1)时,它会给出以下错误,表明它找不到StatCounter:

import org.apache.spark.util.StatCounter
<console>:9: error: not found: type StatCounter
        val stats: StatCounter = new StatCounter()
                   ^
<console>:9: error: not found: type StatCounter
        val stats: StatCounter = new StatCounter()
                                     ^
<console>:23: error: not found: type NAStatCounter
        def apply(x: Double) = new NAStatCounter().add(x)
Run Code Online (Sandbox Code Playgroud)

如果我只是在spark-shell中执行以下操作,则没有问题:

scala> import org.apache.spark.util.StatCounter
import org.apache.spark.util.StatCounter

scala> val statsCounter: StatCounter = new StatCounter()
statsCounter: org.apache.spark.util.StatCounter = (count: 0, mean: 0.000000, stdev: NaN, max: -Infinity, min: Infinity)
Run Code Online (Sandbox Code Playgroud)

问题似乎与spark-shell中的:load命令有关.

这是代码:

import org.apache.spark.util.StatCounter
class NAStatCounter extends Serializable {
    val stats: StatCounter = new StatCounter()
    var missing: Long = 0

    def add(x: Double): NAStatCounter = {
        if (java.lang.Double.isNaN(x)) {
            missing += 1
        } else {
        stats.merge(x)
        }
        this
    }

    def merge(other: NAStatCounter): NAStatCounter = {
        stats.merge(other.stats)
        missing += other.missing
        this
    }

    override def toString = {
        "stats: " + stats.toString + " NaN: " + missing
    }
}

object NAStatCounter extends Serializable {
    def apply(x: Double) = new NAStatCounter().add(x)
}
Run Code Online (Sandbox Code Playgroud)

小智 3

我和你有完全相同的问题。
我按照你的尝试解决了这个问题,
改变

val stats: StatCounter = new StatCounter() 
Run Code Online (Sandbox Code Playgroud)

进入

val stats: org.apache.spark.util.StatCounter = new org.apache.spark.util.StatCounter()  
Run Code Online (Sandbox Code Playgroud)

原因可能是系统不知道 StatCounter 的路径