Java:在资源中存储大地图

Question

Java:在资源中存储大地图

Eik*_*chu 5 java serialization dictionary

我需要使用一个包含String,String对的大文件,因为我想用JAR发送它,我选择在应用程序的资源文件夹中包含一个序列化和gzip压缩版本.这就是我创建序列化的方式:

ObjectOutputStream out = new ObjectOutputStream(
            new BufferedOutputStream(new GZIPOutputStream(new FileOutputStream(OUT_FILE_PATH, false))));
out.writeObject(map);
out.close();

Run Code Online (Sandbox Code Playgroud)

我选择使用a HashMap<String,String>,生成的文件是60MB,地图包含大约400万条目.

现在,当我需要地图时,我使用以下方法对其进行反序列化:

final InputStream in = FileUtils.getResource("map.ser.gz");
final ObjectInputStream ois = new ObjectInputStream(new BufferedInputStream(new GZIPInputStream(in)));
map = (Map<String, String>) ois.readObject();
ois.close();

Run Code Online (Sandbox Code Playgroud)

这大约需要10~15秒.有没有更好的方法在JAR中存储这么大的地图？我问,因为我也使用了斯坦福CoreNLP库,该库本身使用大型模型文件,但在这方面似乎表现更好.我试图找到模型文件被读取但放弃的代码.

Answer 1

Nic*_*tto 0

您可以做的是应用《Java Performance：Scott Oaks的权威指南》一书中的技术，该技术实际上将对象的压缩内容存储到字节数组中，因此为此我们需要一个包装类，我在这里调用它MapHolder：

public class MapHolder implements Serializable {
    // This will contain the zipped content of my map
    private byte[] content;
    // My actual map defined as transient as I don't want to serialize its 
    // content but its zipped content
    private transient Map<String, String> map;

    public MapHolder(Map<String, String> map) {
        this.map = map;
    }

    private void writeObject(ObjectOutputStream out) throws IOException {
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        try (GZIPOutputStream zip = new GZIPOutputStream(baos);
            ObjectOutputStream oos = new ObjectOutputStream(
                new BufferedOutputStream(zip))) {
            oos.writeObject(map);
        }
        this.content = baos.toByteArray();
        out.defaultWriteObject();
        // Clear the temporary field content
        this.content = null;
    }

    private void readObject(ObjectInputStream in) throws IOException,
        ClassNotFoundException {
        in.defaultReadObject();
        try (ByteArrayInputStream bais = new ByteArrayInputStream(content);
            GZIPInputStream zip = new GZIPInputStream(bais);
            ObjectInputStream ois = new ObjectInputStream(
                new BufferedInputStream(zip))) {
            this.map = (Map<String, String>) ois.readObject();
            // Clean the temporary field content
            this.content = null;
        }
    }

    public Map<String, String> getMap() {
        return this.map;
    }
}

Run Code Online (Sandbox Code Playgroud)

您的代码将简单地是：

final ByteArrayInputStream in = new ByteArrayInputStream(
    Files.readAllBytes(Paths.get("/tmp/map.ser"))
);
final ObjectInputStream ois = new ObjectInputStream(in);
MapHolder holder = (MapHolder) ois.readObject();
map = holder.getMap();
ois.close();

Run Code Online (Sandbox Code Playgroud)

您可能已经注意到，在序列化实例时，您不再压缩内部压缩的内容MapHolder。

归档时间：	9 年，5 月前
查看次数：	520 次
最近记录：	9 年，5 月前