如何在spark sql中解析嵌套的JSON对象?

Non*_*one 22 json apache-spark apache-spark-sql

我有一个如下所示的架构.我如何解析嵌套对象

root
 |-- apps: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- appName: string (nullable = true)
 |    |    |-- appPackage: string (nullable = true)
 |    |    |-- Ratings: array (nullable = true)
 |    |    |    |-- element: struct (containsNull = true)
 |    |    |    |    |-- date: string (nullable = true)
 |    |    |    |    |-- rating: long (nullable = true)
 |-- id: string (nullable = true)
Run Code Online (Sandbox Code Playgroud)

Vas*_*ias 25

假设您在json文件中读取并打印模式,您将向我们显示如下:

DataFrame df = sqlContext.read().json("/path/to/file").toDF();
    df.registerTempTable("df");
    df.printSchema();
Run Code Online (Sandbox Code Playgroud)

然后你可以在结构类型中选择嵌套对象,就像这样......

DataFrame app = df.select("app");
        app.registerTempTable("app");
        app.printSchema();
        app.show();
DataFrame appName = app.select("element.appName");
        appName.registerTempTable("appName");
        appName.printSchema();
        appName.show();
Run Code Online (Sandbox Code Playgroud)

  • 只是添加,上面的代码不需要`registerTempTable`来工作.只有在需要执行spark sql查询时才需要`registerTempTable`.自`Spark 2.0以来,`registerTempTable`已被弃用,并被`createOrReplaceTempView取代 (8认同)

ben*_*man 5

试试这个:

val nameAndAddress = sqlContext.sql("""
    SELECT name, address.city, address.state
    FROM people
""")
nameAndAddress.collect.foreach(println)
Run Code Online (Sandbox Code Playgroud)

资料来源:https: //databricks.com/blog/2015/02/02/an-introduction-to-json-support-in-spark-sql.html