GZIPInputStream到String

Mat*_*att 33 java gzip http gzipinputstream

首先,如果我的术语有点业余,我很抱歉,试着忍受我;)

我试图将HTTP响应的gzipped主体转换为明文.我已经采用了这个响应的字节数组并将其转换为ByteArrayInputStream.然后我将其转换为GZIPInputStream.我现在想要读取GZIPInputStream并将最终解压缩的HTTP响应主体存储为纯文本字符串.

此代码将最终解压缩的内容存储在OutputStream中,但我想将内容存储为String:

public static int sChunk = 8192;
ByteArrayInputStream bais = new ByteArrayInputStream(responseBytes);
GZIPInputStream gzis = new GZIPInputStream(bais);
byte[] buffer = new byte[sChunk];
int length;
while ((length = gzis.read(buffer, 0, sChunk)) != -1) {
        out.write(buffer, 0, length);
}
Run Code Online (Sandbox Code Playgroud)

Viv*_*sse 46

要从InputStream解码字节,可以使用InputStreamReader.然后,BufferedReader将允许您逐行读取您的流.

您的代码将如下所示:

ByteArrayInputStream bais = new ByteArrayInputStream(responseBytes);
GZIPInputStream gzis = new GZIPInputStream(bais);
InputStreamReader reader = new InputStreamReader(gzis);
BufferedReader in = new BufferedReader(reader);

String readed;
while ((readed = in.readLine()) != null) {
    System.out.println(readed);
}
Run Code Online (Sandbox Code Playgroud)

  • 除了潜在的编码错误之外,还要注意这种方法可以吞噬新行.因此,如果你想在输出中保留换行符,你真的需要自己将它们显式地添加到`output`(就像使用`PrintWriter#println()`或`BufferedWriter #newLine()`).或者只是去一个`char [] buffer`循环方法,如另一个不接受新行的答案所示. (2认同)

Bal*_*usC 33

您应该更好地获得响应InputStream而不是byte[].然后你可以使用ungzip并将GZIPInputStream其作为字符数据读取InputStreamReader,最后将其作为字符数据写入Stringusing StringWriter.

String body = null;
String charset = "UTF-8"; // You should determine it based on response header.

try (
    InputStream gzippedResponse = response.getInputStream();
    InputStream ungzippedResponse = new GZIPInputStream(gzippedResponse);
    Reader reader = new InputStreamReader(ungzippedResponse, charset);
    Writer writer = new StringWriter();
) {
    char[] buffer = new char[10240];
    for (int length = 0; (length = reader.read(buffer)) > 0;) {
        writer.write(buffer, 0, length);
    }
    body = writer.toString();
}

// ...
Run Code Online (Sandbox Code Playgroud)

也可以看看:


如果你的最终目的是将响应解析为HTML,那么我强烈建议你像使用Jsoup一样使用HTML解析器.然后就像这样简单:

String html = Jsoup.connect("http://google.com").get().html();
Run Code Online (Sandbox Code Playgroud)


Mis*_*bas 6

使用try-with-resources习惯用法(在从块退出时自动关闭在try(...)中打开的所有资源)以使代码更清晰.

使用Apache IOUtils使用默认CharSet将inputStream转换为String.

import org.apache.commons.io.IOUtils;
public static String gzipFileToString(File file) throws IOException {
    try(GZIPInputStream gzipIn = new GZIPInputStream(new FileInputStream(file))) {
        return IOUtils.toString(gzipIn);
    }
}
Run Code Online (Sandbox Code Playgroud)