使用python ijson读取带有多个json对象的大型json文件

Question

使用python ijson读取带有多个json对象的大型json文件

我正在尝试使用ijson包解析一个大的(~100MB)json文件,它允许我以有效的方式与文件交互.但是,在编写了这样的代码之后,

with open(filename, 'r') as f:
    parser = ijson.parse(f)
    for prefix, event, value in parser:
        if prefix == "name":
            print(value)

Run Code Online (Sandbox Code Playgroud)

我发现代码只解析第一行,而不解析文件中的其余行!

以下是我的json文件的一部分:

{"name":"accelerator_pedal_position","value":0,"timestamp":1364323939.012000}
{"name":"engine_speed","value":772,"timestamp":1364323939.027000}
{"name":"vehicle_speed","value":0,"timestamp":1364323939.029000}
{"name":"accelerator_pedal_position","value":0,"timestamp":1364323939.035000}

Run Code Online (Sandbox Code Playgroud)

在我看来,我认为ijson只解析一个json对象.

有人可以建议如何解决这个问题？

Answer 1

use*_*253 6

由于提供的块看起来更像是一组构成独立JSON的行,因此应该对其进行解析:

# each JSON is small, there's no need in iterative processing
import json 
with open(filename, 'r') as f:
    for line in f:
        data = json.loads(line)
        # data[u'name'], data[u'engine_speed'], data[u'timestamp'] now
        # contain correspoding values

Run Code Online (Sandbox Code Playgroud)

Answer 2

Mr-*_*IDE 6

不幸的是，ijson库（截至2018年3月为v2.3）无法处理多个JSON对象的解析。它只能处理1个整体对象，如果尝试解析第二个对象，则会收到错误消息："ijson.common.JSONError: Additional data"。在此处查看错误报告：

这是一个很大的限制。但是，只要在每个JSON对象后都有换行符（换行符），就可以独立地逐行解析每个，如下所示：

import io
import ijson

with open(filename, encoding="UTF-8") as json_file:
    cursor = 0
    for line_number, line in enumerate(json_file):
        print ("Processing line", line_number + 1,"at cursor index:", cursor)
        line_as_file = io.StringIO(line)
        # Use a new parser for each line
        json_parser = ijson.parse(line_as_file)
        for prefix, type, value in json_parser:
            print ("prefix=",prefix, "type=",type, "value=",value)
        cursor += len(line)

Run Code Online (Sandbox Code Playgroud)

您仍在流式传输文件，而不是将其完全加载到内存中，因此它可以在大型JSON文件上使用。它还从以下方面使用行流技术：如何跳转到巨大的文本文件中的特定行？并enumerate()从以下方式使用：在“ for”循环中访问索引？

归档时间：	9 年，9 月前
查看次数：	8752 次
最近记录：	6 年，11 月前