dma*_*tej 5 java string encoding character-encoding
转换后的Latin1上的以下测试失败,因为非法字符被替换为值为63(问号)的字节.问题是这些字符应该更好地导致一些例外......
@Test
public void testEncoding() throws UnsupportedEncodingException {
final String czech = "?íze?ek a šampá?o a žíze?";
// okay
final byte[] bytesInLatin2 = czech.getBytes("ISO8859-2");
// different bytes, but okay
final byte[] bytesInWin1250 = czech.getBytes("Windows-1250");
// different bytes, but okay
final byte[] bytesInUtf8 = czech.getBytes("UTF-8");
// nonsense; ?,?,... are not in Latin1 code set!!!
final byte[] bytesInLatin1 = czech.getBytes("ISO8859-1");
System.out.println(Arrays.toString(bytesInLatin2));
System.out.println(Arrays.toString(bytesInWin1250));
System.out.println(Arrays.toString(bytesInUtf8));
System.out.println(Arrays.toString(bytesInLatin1));
System.out.flush();
final String latin2 = new String(bytesInLatin2, "ISO8859-2");
final String win1250 = new String(bytesInWin1250, "Windows-1250");
final String utf8 = new String(bytesInUtf8, "UTF-8");
final String latin1 = new String(bytesInLatin1, "ISO8859-1");
Assert.assertEquals("latin2", czech, latin2);
Assert.assertEquals("win1250", czech, win1250);
Assert.assertEquals("utf8", czech, utf8);
Assert.assertEquals("latin1", czech, latin1); // this test will fail!
}
Run Code Online (Sandbox Code Playgroud)
由于Java的这种行为,有很多情况下数据最终被破坏.是否可以使用任何库来验证字符串是否可以使用某些编码进行编码?
我怀疑你在寻找CharsetEncoder.canEncode(CharSequence).
Charset latin2 = Charset.forName("ISO8859-2");
boolean validInLatin2 = latin2.newEncoder().canEncode(czech);
...
Run Code Online (Sandbox Code Playgroud)