Deflater.deflate和小输出缓冲区

Joa*_*elt 8 java deflate

我看到一个奇怪的情况,使用Java 8u45的小输出缓冲区和java.util.Deflater.deflate(byte[] b, int off, int len, int flush)使用小输出缓冲区的方法.

(我正在开发一些与WebSocket即将推出的permessage-deflate扩展相关的低级网络代码,所以小缓冲区对我来说是现实的)

示例代码:

package deflate;

import java.nio.charset.StandardCharsets;
import java.util.zip.Deflater;

public class DeflaterSmallBufferBug
{
    public static void main(String[] args)
    {
        boolean nowrap = true;
        Deflater deflater = new Deflater(Deflater.DEFAULT_COMPRESSION,nowrap);

        byte[] input = "Hello".getBytes(StandardCharsets.UTF_8);

        System.out.printf("input is %,d bytes - %s%n",input.length,getHex(input,0,input.length));

        deflater.setInput(input);

        byte[] output = new byte[input.length];

        // break out of infinite loop seen with bug
        int maxloops = 10;

        // Compress the data
        while (maxloops-- > 0)
        {
            int compressed = deflater.deflate(output,0,output.length,Deflater.SYNC_FLUSH);
            System.out.printf("compressed %,d bytes - %s%n",compressed,getHex(output,0,compressed));

            if (compressed < output.length)
            {
                System.out.printf("Compress success");
                return;
            }
        }

        System.out.printf("Exited compress (maxloops left %d)%n",maxloops);
    }

    private static String getHex(byte[] buf, int offset, int len)
    {
        StringBuilder hex = new StringBuilder();
        hex.append('[');
        for (int i = offset; i < (offset + len); i++)
        {
            if (i > offset)
            {
                hex.append(' ');
            }
            hex.append(String.format("%02X",buf[i]));
        }
        hex.append(']');
        return hex.toString();
    }
}
Run Code Online (Sandbox Code Playgroud)

在上面的例子中,我试图"Hello"使用长度为5个字节的输出缓冲区为输入生成压缩字节.

我假设以下结果字节:

buffer 1 [ F2 48 CD C9 C9 ]
buffer 2 [ 07 00 00 00 FF ]
buffer 3 [ FF ]
Run Code Online (Sandbox Code Playgroud)

翻译为

[ F2 48 CD C9 C9 07 00 ] <-- the compressed data
[ 00 00 FF FF ]          <-- the deflate tail bytes
Run Code Online (Sandbox Code Playgroud)

但是,当Deflater.deflate()与小缓冲区一起使用时,此正常循环将无限延续5个字节的压缩数据(似乎仅在5个字节或更低的缓冲区中显示).

产生上述演示的结果输出...

input is 5 bytes - [48 65 6C 6C 6F]
compressed 5 bytes - [F2 48 CD C9 C9]
compressed 5 bytes - [07 00 00 00 FF]
compressed 5 bytes - [FF 00 00 00 FF]
compressed 5 bytes - [FF 00 00 00 FF]
compressed 5 bytes - [FF 00 00 00 FF]
compressed 5 bytes - [FF 00 00 00 FF]
compressed 5 bytes - [FF 00 00 00 FF]
compressed 5 bytes - [FF 00 00 00 FF]
compressed 5 bytes - [FF 00 00 00 FF]
compressed 5 bytes - [FF 00 00 00 FF]
Exited compress (maxloops left -1)
Run Code Online (Sandbox Code Playgroud)

如果输入/输出大于5个字节,则问题似乎消失了.(只需输入字符串"Hellox"即可自行测试)

使缓冲区为6个字节的结果(输入为"Hellox")

input is 6 bytes - [48 65 6C 6C 6F 78]
compressed 6 bytes - [F2 48 CD C9 C9 AF]
compressed 6 bytes - [00 00 00 00 FF FF]
compressed 5 bytes - [00 00 00 FF FF]
Compress success
Run Code Online (Sandbox Code Playgroud)

即使这些结果对我来说也有点古怪,因为它似乎存在2个缩减的尾字节序列.

所以,我想我的最终问题是,我是否遗漏了一些关于使我的事情Deflater变得奇怪的用法,或者这是否指出了JVM Deflater实现本身可能存在的错误?

更新:2015年8月7日

此发现已被接受为bugs.java.com/JDK-8133170

Mar*_*ler 6

这是一个zlib"功能",在zlib.h中有记录:

如果是Z_FULL_FLUSH或Z_SYNC_FLUSH,请确保avail_out大于6,以避免因返回时avail_out == 0而重复刷新标记.

发生的事情是每次调用deflate()with Z_SYNC_FLUSH都会插入一个五字节的flush标记.由于您没有提供足够的输出空间来获取标记,因此您再次调用以获得更多输出,但要求它同时插入另一个刷新标记.

您应该做的是deflate()使用Z_SYNC_FLUSH 一次调用,然后deflate()使用Z_NO_FLUSH(或NO_FLUSH在Java中)使用其他调用(如果需要)获取所有可用输出.