小编rab*_*nnh的帖子

Spark 2.0.2似乎并不认为"groupBy"正在返回一个DataFrame

这感觉有点傻,但我从Spark 1.6.1迁移到Spark 2.0.2.我正在使用Databrick CSV库,现在正尝试使用内置CSV DataFrameWriter.

以下代码:

    // Get an SQLContext
    val sqlContext = new SQLContext(sc)
    import sqlContext.implicits._

    var sTS = lTimestampToSummarize.toString()
    val sS3InputPath = "s3://measurements/" + sTS + "/*"

    // Read all measurements - note that all subsequent ETLs will reuse dfRaw
    val dfRaw = sqlContext.read.json(sS3InputPath)

    // Filter just the user/segment timespent records
    val dfSegments = dfRaw.filter("segment_ts <> 0").withColumn("views", lit(1))

    // Aggregate views and timespent per user/segment tuples
    val dfUserSegments : DataFrame = dfSegments.groupBy("company_id", "division_id", "department_id", "course_id", "user_id", "segment_id") …
Run Code Online (Sandbox Code Playgroud)

scala dataframe apache-spark

1
推荐指数
1
解决办法
224
查看次数

标签 统计

apache-spark ×1

dataframe ×1

scala ×1