Ten*_*enG 1 java gzipinputstream
我搜索了一个如何在Java中压缩字符串的示例。
我有一个压缩然后解压缩的功能。压缩似乎可以正常工作:
public static String encStage1(String str)
{
String format1 = "ISO-8859-1";
String format2 = "UTF-8";
if (str == null || str.length() == 0)
{
return str;
}
System.out.println("String length : " + str.length());
ByteArrayOutputStream out = new ByteArrayOutputStream();
String outStr = null;
try
{
GZIPOutputStream gzip = new GZIPOutputStream(out);
gzip.write(str.getBytes());
gzip.close();
outStr = out.toString(format2);
System.out.println("Output String lenght : " + outStr.length());
} catch (Exception e)
{
e.printStackTrace();
}
return outStr;
}
Run Code Online (Sandbox Code Playgroud)
但是相反的是,即使我将encStage1的返回结果直接传递回decStage3,也抱怨该字符串不是GZIP格式:
public static String decStage3(String str)
{
if (str == null || str.length() == 0)
{
return str;
}
System.out.println("Input String length : " + str.length());
String outStr = "";
try
{
String format1 = "ISO-8859-1";
String format2 = "UTF-8";
GZIPInputStream gis = new GZIPInputStream(new ByteArrayInputStream(str.getBytes(format2)));
BufferedReader bf = new BufferedReader(new InputStreamReader(gis, format2));
String line;
while ((line = bf.readLine()) != null)
{
outStr += line;
}
System.out.println("Output String lenght : " + outStr.length());
} catch (Exception e)
{
e.printStackTrace();
}
return outStr;
}
Run Code Online (Sandbox Code Playgroud)
从encStage1返回的字符串调用时出现此错误:
public String encIDData(String idData)
{
String tst = "A simple test string";
System.out.println("Enc 0: " + tst);
String stg1 = encStage1(tst);
System.out.println("Enc 1: " + toHex(stg1));
String dec1 = decStage3(stg1);
System.out.println("unzip: " + toHex(dec1));
}
Run Code Online (Sandbox Code Playgroud)
输出/错误:
Enc 0: A simple test string
String length : 20
Output String lenght : 40
Enc 1: 1fefbfbd0800000000000000735428efbfbdefbfbd2defbfbd495528492d2e51282e29efbfbdefbfbd4b07005aefbfbd21efbfbd14000000
Input String length : 40
java.io.IOException: Not in GZIP format
at java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:137)
at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:58)
at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:68)
Run Code Online (Sandbox Code Playgroud)
一个小错误是:
gzip.write(str.getBytes());
Run Code Online (Sandbox Code Playgroud)
采用默认的平台编码,在Windows上永远不会是ISO-8859-1。更好:
gzip.write(str.getBytes(format1));
Run Code Online (Sandbox Code Playgroud)
您可以考虑采用Windows Latin-1(对于某些欧洲语言)为“ Cp1252”,而不是采用Latin-1为“ ISO-8859-1”。这就增加了引号之类的逗号。
主要错误是将压缩字节转换为字符串。Java将二进制数据(字节[],InputStream,OutputStream)与文本(字符串,字符,读者,作家)分开,后者内部始终以Unicode格式保存。字节序列不需要是有效的UTF-8。您可以通过将字节转换为单字节编码(例如ISO-8859-1)来摆脱困境。
最好的方法是
gzip.write(str.getBytes(StandardCharsets.UTF_8));
Run Code Online (Sandbox Code Playgroud)
因此,您具有完整的Unicode,每个脚本都可以组合。
并解压缩到ByteArrayOutputStream和new String(baos.toByteArray(), StandardCharsets.UTF_8)。在带有UTF-8的InputStreamReader上使用BufferedReader也是可以的,但是readLine会丢弃换行符
outStr += line + "\r\n"; // Or so.
Run Code Online (Sandbox Code Playgroud)
干净答案:
public static byte[] encStage1(String str) throws IOException
{
try (ByteArrayOutputStream out = new ByteArrayOutputStream())
{
try (GZIPOutputStream gzip = new GZIPOutputStream(out))
{
gzip.write(str.getBytes(StandardCharsets.UTF_8));
}
return out.toByteArray();
//return out.toString(StandardCharsets.ISO_8859_1);
// Some single byte encoding
}
}
public static String decStage3(byte[] str) throws IOException
{
ByteArrayOutputStream baos = new ByteArrayOutputStream();
try (GZIPInputStream gis = new GZIPInputStream(new ByteArrayInputStream(str)))
{
int b;
while ((b = gis.read()) != -1) {
baos.write((byte) b);
}
}
return new String(baos.toByteArray(), StandardCharset.UTF_8);
}
Run Code Online (Sandbox Code Playgroud)