小编var*_*n r的帖子

spark scala:将Struct列的Array转换为String列

我有一个列,它是从json文件推导出的array <Struct>类型.我想将数组<Struct>转换为字符串,以便我可以将此数组列保持在hive中,并将其作为单个列导出到RDBMS.

temp.json

{"properties":{"items":[{"invoicid":{"value":"923659"},"job_id":
{"value":"296160"},"sku_id":
{"value":"312002"}}],"user_id":"6666","zip_code":"666"}}
Run Code Online (Sandbox Code Playgroud)

处理:

scala> val temp = spark.read.json("s3://check/1/temp1.json")
temp: org.apache.spark.sql.DataFrame = [properties: struct<items:
array<struct<invoicid:struct<value:string>,job_id:struct<value:string>,sku_id:struct<value:string>>>, user_id: string ... 1 more field>]

    scala> temp.printSchema
    root
     |-- properties: struct (nullable = true)
     |    |-- items: array (nullable = true)
     |    |    |-- element: struct (containsNull = true)
     |    |    |    |-- invoicid: struct (nullable = true)
     |    |    |    |    |-- value: string (nullable = true)
     |    |    |    |-- job_id: struct (nullable = true)
     |    |    |    | …
Run Code Online (Sandbox Code Playgroud)

arrays json scala apache-spark

3
推荐指数
1
解决办法
5404
查看次数

标签 统计

apache-spark ×1

arrays ×1

json ×1

scala ×1