这感觉有点傻,但我从Spark 1.6.1迁移到Spark 2.0.2.我正在使用Databrick CSV库,现在正尝试使用内置CSV DataFrameWriter.
以下代码:
// Get an SQLContext
val sqlContext = new SQLContext(sc)
import sqlContext.implicits._
var sTS = lTimestampToSummarize.toString()
val sS3InputPath = "s3://measurements/" + sTS + "/*"
// Read all measurements - note that all subsequent ETLs will reuse dfRaw
val dfRaw = sqlContext.read.json(sS3InputPath)
// Filter just the user/segment timespent records
val dfSegments = dfRaw.filter("segment_ts <> 0").withColumn("views", lit(1))
// Aggregate views and timespent per user/segment tuples
val dfUserSegments : DataFrame = dfSegments.groupBy("company_id", "division_id", "department_id", "course_id", "user_id", "segment_id") …Run Code Online (Sandbox Code Playgroud)