Fel*_*ipe 2 scala apache-spark
如何查看SparkContext是否有内容正在执行以及何时完成所有内容我会停止它?因为目前我在等待30秒才调用SparkContext.stop,否则我的应用程序会抛出错误.
import org.apache.log4j.Level
import org.apache.log4j.Logger
import org.apache.spark.SparkContext
object RatingsCounter extends App {
// set the log level to print only errors
Logger.getLogger("org").setLevel(Level.ERROR)
// create a SparkContext using every core of the local machine, named RatingsCounter
val sc = new SparkContext("local[*]", "RatingsCounter")
// load up each line of the ratings data into an RDD (Resilient Distributed Dataset)
val lines = sc.textFile("src/main/resource/u.data", 0)
// convert each line to s string, split it out by tabs and extract the third field.
// The file format is userID, movieID, rating, timestamp
val ratings = lines.map(x => x.toString().split("\t")(2))
// count up how many times each value occurs
val results = ratings.countByValue()
// sort the resulting map of (rating, count) tuples
val sortedResults = results.toSeq.sortBy(_._1)
// print each result on its own line.
sortedResults.foreach { case (key, value) => println("movie ID: " + key + " - rating times: " + value) }
Thread.sleep(30000)
sc.stop()
}
Run Code Online (Sandbox Code Playgroud)
Spark应用程序应定义main()方法而不是扩展scala.App.子类scala.App可能无法正常工作.
而且,由于您正在扩展App,因此您会遇到意外行为.
您可以在有关自包含应用程序的官方文档中阅读更多相关信息.
App使用DelayedInit并可能导致初始化问题.使用主要方法,您可以知道发生了什么.摘录自reddit.
object HelloWorld extends App {
var a = 1
a + 1
override def main(args: Array[String]) {
println(a) // guess what's the value of a ?
}
}
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
758 次 |
| 最近记录: |