如何在不使用临时文件的情况下从Java中的嵌套zip文件中读取数据?

R T*_*ter 3 java zip zipinputstream

我正在尝试从嵌套的 zip 存档中提取文件并在内存中处理它们。

这个问题不是关于什么的:

  1. 如何在Java中读取zip文件:不,问题是如何读取zip文件中的zip文件,等等(如嵌套zip文件)。

  2. 将临时结果写入磁盘:不,我问的是在内存中完成这一切。我使用将结果临时写入磁盘的不太有效的技术找到了许多答案,但这不是我想要做的。

例子:

Zip 文件 -> Zipfile1 -> Zipfile2 -> Zipfile3

目标:提取每个嵌套 zip 文件中找到的数据,所有数据都在内存中并使用 Java。

你说ZipFile就是答案?不,不是,它适用于第一次迭代,即:

Zip 文件 -> Zipfile1

但是一旦你到达 Zipfile2,并执行:

ZipInputStream z = new ZipInputStream(zipFile.getInputStream( zipEntry) ) ;
Run Code Online (Sandbox Code Playgroud)

你会得到一个 NullPointerException。

我的代码:

public class ZipHandler {

    String findings = new String();
    ZipFile zipFile = null;

    public void init(String fileName) throws AppException{

        try {
        //read file into stream
        zipFile = new ZipFile(fileName);  
        Enumeration<?> enu = zipFile.entries();  
        exctractInfoFromZip(enu);

        zipFile.close();
        } catch (FileNotFoundException e) {
        e.printStackTrace();

        } catch (IOException e) {
            e.printStackTrace();
    }
}

//The idea was recursively extract entries using ZipFile
public void exctractInfoFromZip(Enumeration<?> enu) throws IOException, AppException{   

    try {
        while (enu.hasMoreElements()) { 
            ZipEntry zipEntry = (ZipEntry) enu.nextElement();

            String name = zipEntry.getName();
            long size = zipEntry.getSize();
            long compressedSize = zipEntry.getCompressedSize();

            System.out.printf("name: %-20s | size: %6d | compressed size: %6d\n", 
                    name, size, compressedSize);

            // directory ?
            if (zipEntry.isDirectory()) {
                System.out.println("dir found:" + name);
                findings+=", " + name; 
                continue;
            } 

            if (name.toUpperCase().endsWith(".ZIP") ||  name.toUpperCase().endsWith(".GZ")) {
                String fileType = name.substring(
                        name.lastIndexOf(".")+1, name.length());

                System.out.println("File type:" + fileType);
                System.out.println("zipEntry: " + zipEntry);

                if (fileType.equalsIgnoreCase("ZIP")) {
//ZipFile here returns a NULL pointer when you try to get the first nested zip
                    ZipInputStream z = new ZipInputStream(zipFile.getInputStream(zipEntry) ) ;
                    System.out.println("Opening ZIP as stream: " + name);

                    findings+=", " + name;

                    exctractInfoFromZip(zipInputStreamToEnum(z));
                } else if (fileType.equalsIgnoreCase("GZ")) {
//ZipFile here returns a NULL pointer when you try to get the first nested zip      
                    GZIPInputStream z = new GZIPInputStream(zipFile.getInputStream(zipEntry) ) ;
                    System.out.println("Opening ZIP as stream: " + name);

                    findings+=", " + name;

                    exctractInfoFromZip(gZipInputStreamToEnum(z));
                } else
                    throw new AppException("extension not recognized!");
            } else {
                System.out.println(name);
                findings+=", " + name;
            }
        }
    } catch (IOException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }

    System.out.println("Findings " + findings);
} 

public Enumeration<?> zipInputStreamToEnum(ZipInputStream zStream) throws IOException{

    List<ZipEntry> list = new ArrayList<ZipEntry>();    

    while (zStream.available() != 0) {
        list.add(zStream.getNextEntry());
    }

    return Collections.enumeration(list);
} 
Run Code Online (Sandbox Code Playgroud)

JMa*_*Max 6

我还没有尝试过,但使用ZipInputStream你可以读取任何InputStream that contains a ZIP file as data. Iterate through the entries and when you found the correct entry use the ZipInputStream to create another nestedZipInputStream`。

下面的代码演示了这一点。想象一下,我们有一个readme.txt内部0.zip,又被压缩1.zip2.zip。现在我们读一些文本readme.txt

try (FileInputStream fin = new FileInputStream("D:/2.zip")) {
    ZipInputStream firstZip = new ZipInputStream(fin);
    ZipInputStream zippedZip = new ZipInputStream(findEntry(firstZip, "1.zip"));
    ZipInputStream zippedZippedZip = new ZipInputStream(findEntry(zippedZip, "0.zip"));

    ZipInputStream zippedZippedZippedReadme = findEntry(zippedZippedZip, "readme.txt");
    InputStreamReader reader = new InputStreamReader(zippedZippedZippedReadme);
    char[] cbuf = new char[1024];
    int read = reader.read(cbuf);
    System.out.println(new String(cbuf, 0, read));
    .....

public static ZipInputStream findEntry(ZipInputStream in, String name) throws IOException {
    ZipEntry entry = null;
    while ((entry = in.getNextEntry()) != null) {
        if (entry.getName().equals(name)) {
            return in;
        }
    }
    return null;
}
Run Code Online (Sandbox Code Playgroud)

请注意,该代码确实很丑陋,并且不会关闭任何内容,也不检查错误。它只是一个最小化版本,演示了它是如何工作的。

理论上,级联到另一个 ZipInputStream 的数量没有限制。数据永远不会写入临时文件。仅当您阅读每个文件时才会按需执行解密InputStream