如何使用Spark Scala读取具有特定格式的Json文件?

Spa*_*ser 3 json scala apache-spark

我正在尝试读取一个Json文件,如:

[ 
{"IFAM":"EQR","KTM":1430006400000,"COL":21,"DATA":[{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"} 
,{"MLrate":"31","Nrout":"0","up":null,"Crate":"2"} 
,{"MLrate":"30","Nrout":"5","up":null,"Crate":"2"} 
,{"MLrate":"34","Nrout":"0","up":null,"Crate":"4"} 
,{"MLrate":"33","Nrout":"0","up":null,"Crate":"2"} 
,{"MLrate":"30","Nrout":"8","up":null,"Crate":"2"} 
]} 
,{"IFAM":"EQR","KTM":1430006400000,"COL":22,"DATA":[{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"} 
,{"MLrate":"30","Nrout":"0","up":null,"Crate":"0"} 
,{"MLrate":"35","Nrout":"1","up":null,"Crate":"5"} 
,{"MLrate":"30","Nrout":"6","up":null,"Crate":"2"} 
,{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"} 
,{"MLrate":"38","Nrout":"8","up":null,"Crate":"1"} 
]} 
,...
] 
Run Code Online (Sandbox Code Playgroud)

我试过这个命令:

    val df = sqlContext.read.json("namefile") 
    df.show() 
Run Code Online (Sandbox Code Playgroud)

但这不起作用:我的专栏无法识别......

zer*_*323 6

如果要使用read.json,则每行需要一个JSON文档.如果您的文件包含带有文档的有效JSON数组,则它将无法按预期工作.例如,如果我们采用您的示例数据输入文件应格式如下:

{"IFAM":"EQR","KTM":1430006400000,"COL":21,"DATA":[{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"}, {"MLrate":"31","Nrout":"0","up":null,"Crate":"2"}, {"MLrate":"30","Nrout":"5","up":null,"Crate":"2"} ,{"MLrate":"34","Nrout":"0","up":null,"Crate":"4"} ,{"MLrate":"33","Nrout":"0","up":null,"Crate":"2"} ,{"MLrate":"30","Nrout":"8","up":null,"Crate":"2"} ]}
{"IFAM":"EQR","KTM":1430006400000,"COL":22,"DATA":[{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"} ,{"MLrate":"30","Nrout":"0","up":null,"Crate":"0"} ,{"MLrate":"35","Nrout":"1","up":null,"Crate":"5"} ,{"MLrate":"30","Nrout":"6","up":null,"Crate":"2"} ,{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"} ,{"MLrate":"38","Nrout":"8","up":null,"Crate":"1"} ]}
Run Code Online (Sandbox Code Playgroud)

如果你使用read.json上面的结构,你会看到它按预期解析:

scala> sqlContext.read.json("namefile").printSchema
root
 |-- COL: long (nullable = true)
 |-- DATA: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- Crate: string (nullable = true)
 |    |    |-- MLrate: string (nullable = true)
 |    |    |-- Nrout: string (nullable = true)
 |    |    |-- up: string (nullable = true)
 |-- IFAM: string (nullable = true)
 |-- KTM: long (nullable = true)
Run Code Online (Sandbox Code Playgroud)