小编gon*_*ure的帖子

如何使用Spark SQL来解析JSON对象数组

现在有JSON数据如下

{"Id":11,"data":[{"package":"com.browser1","activetime":60000},{"package":"com.browser6","activetime":1205000},{"package":"com.browser7","activetime":1205000}]}
{"Id":12,"data":[{"package":"com.browser1","activetime":60000},{"package":"com.browser6","activetime":1205000}]} 
......
Run Code Online (Sandbox Code Playgroud)

此JSON是应用程​​序的激活时间,其目的是分析每个应用程序的总激活时间

我使用sparK SQL来解析JSON

斯卡拉

val sqlContext = sc.sqlContext
val behavior = sqlContext.read.json("behavior-json.log")
behavior.cache()
behavior.createOrReplaceTempView("behavior")
val appActiveTime = sqlContext.sql ("SELECT data FROM behavior") // SQL query
appActiveTime.show (100100) // print dataFrame
appActiveTime.rdd.foreach(println) // print RDD
Run Code Online (Sandbox Code Playgroud)

但是打印的dataFrame是这样的

.

+----------------------------------------------------------------------+

| data|

+----------------------------------------------------------------------+

| [[60000, com.browser1], [12870000, com.browser]]|

| [[60000, com.browser1], [120000, com.browser]]|

| [[60000, com.browser1], [120000, com.browser]]|

| [[60000, com.browser1], [1207000, com.browser]]|

| [[120000, com.browser]]|

| [[60000, com.browser1], [1204000, com.browser5]]|

| [[60000, com.browser1], [12075000, com.browser]]|

| [[60000, com.browser1], …
Run Code Online (Sandbox Code Playgroud)

json scala bigdata apache-spark apache-spark-sql

3
推荐指数
1
解决办法
7804
查看次数

标签 统计

apache-spark ×1

apache-spark-sql ×1

bigdata ×1

json ×1

scala ×1