Dav*_*cia 12 scala bigdata apache-spark
我需要计算scala中代码的运行时.代码是.
val data = sc.textFile("/home/david/Desktop/Datos Entrada/household/household90Parseado.txt")
val parsedData = data.map(s => Vectors.dense(s.split(' ').map(_.toDouble))).cache()
val numClusters = 5
val numIterations = 10
val clusters = KMeans.train(parsedData, numClusters, numIterations)
Run Code Online (Sandbox Code Playgroud)
我需要知道运行时来处理这段代码,时间必须是秒.非常感谢你.
eva*_*man 29
根据此处的讨论,您希望System.nanoTime用来衡量经过的时间差:
val t1 = System.nanoTime
/* your code */
val duration = (System.nanoTime - t1) / 1e9d
Run Code Online (Sandbox Code Playgroud)
最基本的方法是简单地记录开始时间和结束时间,然后做减法。
val startTimeMillis = System.currentTimeMillis()
/* your code goes here */
val endTimeMillis = System.currentTimeMillis()
val durationSeconds = (endTimeMillis - startTimeMillis) / 1000
Run Code Online (Sandbox Code Playgroud)
< Spark 2.1.0 明确您可以在代码中使用此函数以毫秒为单位测量时间
/**
* Executes some code block and prints to stdout the time taken to execute the block. This is
* available in Scala only and is used primarily for interactive testing and debugging.
*
*/
def time[T](f: => T): T = {
val start = System.nanoTime()
val ret = f
val end = System.nanoTime()
println(s"Time taken: ${(end - start) / 1000 / 1000} ms")
ret
}
Run Code Online (Sandbox Code Playgroud)
用法 :
time {
Seq("1", "2").toDS().count()
}
//Time taken: 3104 ms
Run Code Online (Sandbox Code Playgroud)
>= Spark 2.1.0 有一个内置函数 SparkSession
您可以使用 spark.time
用法 :
spark.time {
Seq("1", "2").toDS().count()
}
//Time taken: 3104 ms
Run Code Online (Sandbox Code Playgroud)
您可以使用scalameter:https ://scalameter.github.io/
只需将您的代码块放在方括号中:
val executionTime = measure {
//code goes here
}
Run Code Online (Sandbox Code Playgroud)
您可以配置它以预热jvm,从而使测量更加可靠:
val executionTime = withWarmer(new Warmer.Default) measure {
//code goes here
}
Run Code Online (Sandbox Code Playgroud)
从Spark2我们开始可以使用spark.time(<command>)(直到现在才在scala中)获取执行操作/转换所花费的时间。
例:
查找数 records in a dataframe
scala> spark.time(
sc.parallelize(Seq("foo","bar")).toDF().count() //create df and count
)
Time taken: 54 ms //total time for the execution
res76: Long = 2 //count of records
Run Code Online (Sandbox Code Playgroud)