lec*_*eur 5 java character-encoding
我CharsetDecoder上课有问题.
代码的第一个例子(有效):
final CharsetDecoder dec = Charset.forName("UTF-8").newDecoder();
final ByteBuffer b = ByteBuffer.allocate(3);
final byte[] tab = new byte[]{(byte)-30, (byte)-126, (byte)-84}; //char €
for (int i=0; i<tab.length; i++){
b.put(tab, i, 1);
}
try {
b.flip();
System.out.println("a" + dec.decode(b).toString() + "a");
} catch (CharacterCodingException e1) {
e1.printStackTrace();
}
Run Code Online (Sandbox Code Playgroud)
结果是 a€a
但是当我执行这段代码时:
final CharsetDecoder dec = Charset.forName("UTF-8").newDecoder();
final CharBuffer chars = CharBuffer.allocate(3);
final byte[] tab = new byte[]{(byte)-30, (byte)-126, (byte)-84}; //char €
for (int i=0; i<tab.length; i++){
ByteBuffer buffer = ByteBuffer.wrap(tab, i, 1);
dec.decode(buffer, chars, i == 2);
}
dec.flush(chars);
System.out.println("a" + chars.toString() + "a");
Run Code Online (Sandbox Code Playgroud)
结果是 a
为什么结果不一样?
如何使用decode(ByteBuffer, CharBuffer, endOfInput)类的方法CharsetDecoder来检索结果a€a?
- 编辑 -
所以使用Jesper的代码我这样做.这不是完美的,但可以使用step= 1,2和3
final CharsetDecoder dec = Charset.forName("UTF-8").newDecoder();
final CharBuffer chars = CharBuffer.allocate(6);
final byte[] tab = new byte[]{(byte)97, (byte)-30, (byte)-126, (byte)-84, (byte)97, (byte)97}; //char €
final ByteBuffer buffer = ByteBuffer.allocate(10);
final int step = 3;
for (int i = 0; i < tab.length; i++) {
// Add the next byte to the buffer
buffer.put(tab, i, step);
i+=step-1;
// Remember the current position
final int pos = buffer.position();
int l=chars.position();
// Try to decode
buffer.flip();
final CoderResult result = dec.decode(buffer, chars, i >= tab.length -1);
System.out.println(result);
if (result.isUnderflow() && chars.position() == l) {
// Underflow, prepare the buffer for more writing
buffer.position(pos);
}else{
if (buffer.position() == buffer.limit()){
//ByteBuffer decoded
buffer.clear();
buffer.position(0);
}else{
//a part of ByteBuffer is decoded. We keep only bytes which are not decoded
final byte[] b = buffer.array();
final int f = buffer.position();
final int g = buffer.limit() - buffer.position();
buffer.clear();
buffer.position(0);
buffer.put(b, f, g);
}
}
buffer.limit(buffer.capacity());
}
dec.flush(chars);
chars.flip();
System.out.println(chars.toString());
Run Code Online (Sandbox Code Playgroud)
该方法decode(ByteBuffer, CharBuffer, boolean)返回一个结果,但您忽略了该结果。如果在第二个代码片段中打印结果:
for (int i = 0; i < tab.length; i++) {\n ByteBuffer buffer = ByteBuffer.wrap(tab, i, 1);\n System.out.println(dec.decode(buffer, chars, i == 2));\n}\nRun Code Online (Sandbox Code Playgroud)\n\n你会看到这个输出:
\n\nUNDERFLOW\nMALFORMED[1]\nMALFORMED[1]\na a\nRun Code Online (Sandbox Code Playgroud)\n\n显然,如果您在字符中间开始解码,它就无法正常工作。解码器期望它读取的第一件事是有效 UTF-8 序列的开头。
\n\n编辑- 当解码器报告 时UNDERFLOW,它希望您向输入缓冲区添加更多数据,然后尝试再次调用decode(),但您必须重新向其提供您尝试解码的 UTF-8 序列开头的数据。您无法在 UTF-8 序列的中间继续。
这是一个有效的版本,tab在循环的每次迭代中添加一个字节:
final CharsetDecoder dec = Charset.forName("UTF-8").newDecoder();\nfinal CharBuffer chars = CharBuffer.allocate(3);\nfinal byte[] tab = new byte[]{(byte) -30, (byte) -126, (byte) -84}; //char \xe2\x82\xac\n\nfinal ByteBuffer buffer = ByteBuffer.allocate(10);\n\nfor (int i = 0; i < tab.length; i++) {\n // Add the next byte to the buffer\n buffer.put(tab[i]);\n\n // Remember the current position\n final int pos = buffer.position();\n\n // Try to decode\n buffer.flip();\n final CoderResult result = dec.decode(buffer, chars, i == 2);\n System.out.println(result);\n\n if (result.isUnderflow()) {\n // Underflow, prepare the buffer for more writing\n buffer.limit(buffer.capacity());\n buffer.position(pos);\n }\n}\n\ndec.flush(chars);\nchars.flip();\n\nSystem.out.println("a" + chars.toString() + "a");\nRun Code Online (Sandbox Code Playgroud)\n
| 归档时间: |
|
| 查看次数: |
4247 次 |
| 最近记录: |