Android GZIP 解压缩在缓冲区限制下破坏 unicode 字符

Raf*_*ael 2 java unicode android gzip

我将收到的 GZIP 数据解压缩为字符串。问题当我将 BUFFER_SIZE 设置为 512 时,它会在缓冲区限制点破坏 unicode 字符。结果,我收到带有问号的文本。它发生在非拉丁字母中。

...?? ? ???????...

public static String decompress(byte[] compressed) throws IOException {
        final int BUFFER_SIZE = 512;
        ByteArrayInputStream is = new ByteArrayInputStream(compressed);
        GZIPInputStream gis = new GZIPInputStream(is, BUFFER_SIZE);
        StringBuilder string = new StringBuilder();
        byte[] data = new byte[BUFFER_SIZE];
        int bytesRead;
        while ((bytesRead = gis.read(data)) != -1) {
            string.append(new String(data, 0, bytesRead));
        }
        gis.close();
        is.close();
        return string.toString();
    }
Run Code Online (Sandbox Code Playgroud)

Joo*_*gen 5

错误出在算法中,假设正在读取的块在 UTF-8 字节序列边界上结束(和开始)。

所以这样做:

    ByteArrayInputStream is = new ByteArrayInputStream(compressed);
    GZIPInputStream gis = new GZIPInputStream(is, BUFFER_SIZE);
    byte[] data = new byte[BUFFER_SIZE];
    int bytesRead;
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    while ((bytesRead = gis.read(data)) != -1) {
        baos.write(data, 0, bytesRead);
    }
    gis.close();
    is.close();
    return baos.toString("UTF-8");
Run Code Online (Sandbox Code Playgroud)