Android GZIP 解压缩在缓冲区限制下破坏 unicode 字符

Question

Android GZIP 解压缩在缓冲区限制下破坏 unicode 字符

我将收到的 GZIP 数据解压缩为字符串。问题当我将 BUFFER_SIZE 设置为 512 时，它会在缓冲区限制点破坏 unicode 字符。结果，我收到带有问号的文本。它发生在非拉丁字母中。

...?? ? ???????...

public static String decompress(byte[] compressed) throws IOException {
        final int BUFFER_SIZE = 512;
        ByteArrayInputStream is = new ByteArrayInputStream(compressed);
        GZIPInputStream gis = new GZIPInputStream(is, BUFFER_SIZE);
        StringBuilder string = new StringBuilder();
        byte[] data = new byte[BUFFER_SIZE];
        int bytesRead;
        while ((bytesRead = gis.read(data)) != -1) {
            string.append(new String(data, 0, bytesRead));
        }
        gis.close();
        is.close();
        return string.toString();
    }

Run Code Online (Sandbox Code Playgroud)

Answer 1

Joo*_*gen 5

错误出在算法中，假设正在读取的块在 UTF-8 字节序列边界上结束（和开始）。

所以这样做：

    ByteArrayInputStream is = new ByteArrayInputStream(compressed);
    GZIPInputStream gis = new GZIPInputStream(is, BUFFER_SIZE);
    byte[] data = new byte[BUFFER_SIZE];
    int bytesRead;
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    while ((bytesRead = gis.read(data)) != -1) {
        baos.write(data, 0, bytesRead);
    }
    gis.close();
    is.close();
    return baos.toString("UTF-8");

Run Code Online (Sandbox Code Playgroud)

归档时间：	8 年，4 月前
查看次数：	208 次
最近记录：	8 年，4 月前