pti*_*obj 11 scala mapreduce apache-spark
在官方的spark文档中,有一个累加器的例子,它在一个foreach直接在RDD上的调用中使用:
scala> val accum = sc.accumulator(0)
accum: spark.Accumulator[Int] = 0
scala> sc.parallelize(Array(1, 2, 3, 4)).foreach(x => accum += x)
...
10/09/29 18:41:08 INFO SparkContext: Tasks finished in 0.317106 s
scala> accum.value
res2: Int = 10
Run Code Online (Sandbox Code Playgroud)
我实现了自己的累加器:
val myCounter = sc.accumulator(0)
val myRDD = sc.textFile(inputpath) // :spark.RDD[String]
myRDD.flatMap(line => foo(line)) // line 69
def foo(line: String) = {
myCounter += 1 // line 82 throwing NullPointerException
// compute something on the input
}
println(myCounter.value)
Run Code Online (Sandbox Code Playgroud)
在当地环境中,这很好用.但是,如果我在具有多台计算机的spark独立集群上运行此作业,则工作人员会抛出一个
13/07/22 21:56:09 ERROR executor.Executor: Exception in task ID 247
java.lang.NullPointerException
at MyClass$.foo(MyClass.scala:82)
at MyClass$$anonfun$2.apply(MyClass.scala:67)
at MyClass$$anonfun$2.apply(MyClass.scala:67)
at scala.collection.Iterator$$anon$21.hasNext(Iterator.scala:440)
at scala.collection.Iterator$$anon$19.hasNext(Iterator.scala:400)
at spark.PairRDDFunctions.writeToFile$1(PairRDDFunctions.scala:630)
at spark.PairRDDFunctions$$anonfun$saveAsHadoopDataset$2.apply(PairRDDFunctions.scala:640)
at spark.PairRDDFunctions$$anonfun$saveAsHadoopDataset$2.apply(PairRDDFunctions.scala:640)
at spark.scheduler.ResultTask.run(ResultTask.scala:77)
at spark.executor.Executor$TaskRunner.run(Executor.scala:98)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
Run Code Online (Sandbox Code Playgroud)
在增加累加器的行myCounter.
我的问题是:累加器只能用于直接应用于RDD而不是嵌套函数的"顶级"匿名函数吗?如果是,为什么我的呼叫在本地成功并在群集上失败?
编辑:增加异常的详细程度.
如果你像这样定义函数会怎么样:
def foo(line: String, myc: org.apache.spark.Accumulator[Int]) = {
myc += 1
}
Run Code Online (Sandbox Code Playgroud)
然后这样称呼它:
foo(line, myCounter)
Run Code Online (Sandbox Code Playgroud)
?
| 归档时间: |
|
| 查看次数: |
2187 次 |
| 最近记录: |