Eik*_*chu 5 java serialization dictionary
我需要使用一个包含String,String对的大文件,因为我想用JAR发送它,我选择在应用程序的资源文件夹中包含一个序列化和gzip压缩版本.这就是我创建序列化的方式:
ObjectOutputStream out = new ObjectOutputStream(
new BufferedOutputStream(new GZIPOutputStream(new FileOutputStream(OUT_FILE_PATH, false))));
out.writeObject(map);
out.close();
Run Code Online (Sandbox Code Playgroud)
我选择使用a HashMap<String,String>,生成的文件是60MB,地图包含大约400万条目.
现在,当我需要地图时,我使用以下方法对其进行反序列化:
final InputStream in = FileUtils.getResource("map.ser.gz");
final ObjectInputStream ois = new ObjectInputStream(new BufferedInputStream(new GZIPInputStream(in)));
map = (Map<String, String>) ois.readObject();
ois.close();
Run Code Online (Sandbox Code Playgroud)
这大约需要10~15秒.有没有更好的方法在JAR中存储这么大的地图?我问,因为我也使用了斯坦福CoreNLP库,该库本身使用大型模型文件,但在这方面似乎表现更好.我试图找到模型文件被读取但放弃的代码.
您可以做的是应用《Java Performance:Scott Oaks的权威指南》一书中的技术,该技术实际上将对象的压缩内容存储到字节数组中,因此为此我们需要一个包装类,我在这里调用它MapHolder:
public class MapHolder implements Serializable {
// This will contain the zipped content of my map
private byte[] content;
// My actual map defined as transient as I don't want to serialize its
// content but its zipped content
private transient Map<String, String> map;
public MapHolder(Map<String, String> map) {
this.map = map;
}
private void writeObject(ObjectOutputStream out) throws IOException {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
try (GZIPOutputStream zip = new GZIPOutputStream(baos);
ObjectOutputStream oos = new ObjectOutputStream(
new BufferedOutputStream(zip))) {
oos.writeObject(map);
}
this.content = baos.toByteArray();
out.defaultWriteObject();
// Clear the temporary field content
this.content = null;
}
private void readObject(ObjectInputStream in) throws IOException,
ClassNotFoundException {
in.defaultReadObject();
try (ByteArrayInputStream bais = new ByteArrayInputStream(content);
GZIPInputStream zip = new GZIPInputStream(bais);
ObjectInputStream ois = new ObjectInputStream(
new BufferedInputStream(zip))) {
this.map = (Map<String, String>) ois.readObject();
// Clean the temporary field content
this.content = null;
}
}
public Map<String, String> getMap() {
return this.map;
}
}
Run Code Online (Sandbox Code Playgroud)
您的代码将简单地是:
final ByteArrayInputStream in = new ByteArrayInputStream(
Files.readAllBytes(Paths.get("/tmp/map.ser"))
);
final ObjectInputStream ois = new ObjectInputStream(in);
MapHolder holder = (MapHolder) ois.readObject();
map = holder.getMap();
ois.close();
Run Code Online (Sandbox Code Playgroud)
您可能已经注意到,在序列化实例时,您不再压缩内部压缩的内容MapHolder。