小编ata*_*pha的帖子

read.json 只读取 Spark 中的第一个对象

我有一个 multiLine json 文件,我使用 spark 的 read.json 来读取 json,问题是它只从那个 json 文件中读取第一个对象

val dataFrame = spark.read.option("multiLine", true).option("mode", "PERMISSIVE").json(path)
dataFrame.rdd.saveAsTextFile("DataFrame")
Run Code Online (Sandbox Code Playgroud)

示例json:

{
    "_id" : "589895e123c572923e69f5e7",
    "thing" : "54eb45beb5f1e061454c5bf4",
    "timeline" : [ 
        {
            "reason" : "TRIP_START",
            "timestamp" : "2017-02-06T17:20:18.007+02:00",
            "type" : "TRIP_EVENT",
            "location" : [ 
                11.1174091, 
                69.1174091
            ],
            "endLocation" : [],
            "startLocation" : []
        }, 
            "reason" : "TRIP_END",
            "timestamp" : "2017-02-06T17:25:26.026+02:00",
            "type" : "TRIP_EVENT",
            "location" : [ 
                11.5691428, 
                48.1122443
            ],
            "endLocation" : [],
            "startLocation" : []
        }
    ],
    "__v" : 0
}
{ …
Run Code Online (Sandbox Code Playgroud)

json scala apache-spark

3
推荐指数
1
解决办法
2945
查看次数

标签 统计

apache-spark ×1

json ×1

scala ×1