如何解析一个巨大的 JSON 文件而不将其加载到内存中

Question

如何解析一个巨大的 JSON 文件而不将其加载到内存中

我有一个很大的 JSON 文件 (2.5MB)，包含大约 80000 行。

它看起来像这样：

{
  "a": 123,
  "b": 0.26,
  "c": [HUGE irrelevant object],
  "d": 32
}

Run Code Online (Sandbox Code Playgroud)

我只希望为 keys 存储整数值a，b并d忽略 JSON 的其余部分（即忽略值中的任何内容c）。

我无法修改原始 JSON，因为它是由第三方服务创建的，我从其服务器下载该服务。

如何在不将整个文件加载到内存中的情况下执行此操作？

我尝试使用gson库并创建这样的 bean：

public class MyJsonBean {
  @SerializedName("a")
  @Expose
  public Integer a;

  @SerializedName("b")
  @Expose
  public Double b;

  @SerializedName("d")
  @Expose
  public Integer d;
}

Run Code Online (Sandbox Code Playgroud)

但即便如此，为了使用 Gson 反序列化它，我需要先下载并读取内存中的整个文件，然后将其作为字符串传递给 Gson？

File myFile = new File(<FILENAME>);
myFile.createNewFile();

URL url = new URL(<URL>);
OutputStream out = new BufferedOutputStream(new FileOutputStream(myFile));
URLConnection conn = url.openConnection();

HttpURLConnection httpConn = (HttpURLConnection) conn;

InputStream in = conn.getInputStream();
byte[] buffer = new byte[1024];

int numRead;
while ((numRead = in.read(buffer)) != -1) {
  out.write(buffer, 0, numRead);
}

FileInputStream fis = new FileInputStream(myFile);
byte[] data = new byte[(int) myFile.length()];
fis.read(data);
String str = new String(data, "UTF-8");

Gson gson = new Gson();
MyJsonBean response = gson.fromJson(str, MyJsonBean.class);

System.out.println("a: " + response.a + "" + response.b + "" + response.d);

Run Code Online (Sandbox Code Playgroud)

有什么方法可以避免加载整个文件并只获取我需要的相关值？

Answer 1

Mic*_*ber 5

您绝对应该检查不同的方法和库。如果您真的关心性能检查：Gson和库来执行此操作并选择最快的一个Jackson。JsonPath当然，您必须将整个JSON文件加载到本地磁盘上，可能是TMP文件夹并在之后解析它。

简单的JsonPath解决方案如下所示：

import com.jayway.jsonpath.DocumentContext;
import com.jayway.jsonpath.JsonPath;

import java.io.File;

public class JsonPathApp {
    public static void main(String[] args) throws Exception {
        File jsonFile = new File("./resource/test.json").getAbsoluteFile();

        DocumentContext documentContext = JsonPath.parse(jsonFile);
        System.out.println("" + documentContext.read("$.a"));
        System.out.println("" + documentContext.read("$.b"));
        System.out.println("" + documentContext.read("$.d"));
    }
}

Run Code Online (Sandbox Code Playgroud)

请注意，我没有创建任何POJO，只是使用JSONPath类似于的功能读取给定值XPath。你也可以这样做Jackson：

import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;

import java.io.File;

public class JsonPathApp {
    public static void main(String[] args) throws Exception {
        File jsonFile = new File("./resource/test.json").getAbsoluteFile();

        ObjectMapper mapper = new ObjectMapper();
        JsonNode root = mapper.readTree(jsonFile);
        System.out.println(root.get("a"));
        System.out.println(root.get("b"));
        System.out.println(root.get("d"));
    }
}

Run Code Online (Sandbox Code Playgroud)

我们不需要JSONPath，因为我们需要的值直接在root节点中。正如你所看到的，API看起来几乎一样。我们还可以创建POJO结构：

import com.fasterxml.jackson.annotation.JsonIgnoreProperties;
import com.fasterxml.jackson.databind.ObjectMapper;

import java.io.File;
import java.math.BigDecimal;

public class JsonPathApp {
    public static void main(String[] args) throws Exception {
        File jsonFile = new File("./resource/test.json").getAbsoluteFile();

        ObjectMapper mapper = new ObjectMapper();
        Pojo pojo = mapper.readValue(jsonFile, Pojo.class);
        System.out.println(pojo);
    }
}

@JsonIgnoreProperties(ignoreUnknown = true)
class Pojo {
    private Integer a;
    private BigDecimal b;
    private Integer d;

    // getters, setters
}

Run Code Online (Sandbox Code Playgroud)

JSON即便如此，这两个库都允许直接读取有效负载URL，我建议使用您能找到的最佳方法在另一个步骤中下载它。有关更多信息，请阅读本文：使用 Java 从 URL 下载文件。

归档时间：	6 年，6 月前
查看次数：	25127 次
最近记录：	2 年，8 月前