java需要多少字节的英文和中文字符？

Question

java需要多少字节的英文和中文字符？

import java.io.UnsupportedEncodingException;

public class TestChar {

    public static void main(String[] args) throws UnsupportedEncodingException {
        String cnStr = "?";
        String enStr = "a";
        byte[] cnBytes = cnStr.getBytes("UTF-8");
        byte[] enBytes = enStr.getBytes("UTF-8");

        System.out.println("bytes size of Chinese?" + cnBytes.length);
        System.out.println("bytes size of English?" + enBytes.length);

        //  in java, char takes two bytes, the question is: 
        char cnc = '?'; // will '?‘ take two or three bytes ?
        char enc = 'a'; // will 'a' take one or two bytes ?
    }
}

Run Code Online (Sandbox Code Playgroud)

输出：

   bytes size of Chinese?3

   bytes size of English?1

Run Code Online (Sandbox Code Playgroud)

在这里，我的JVM设置为UTF-8，从输出中我们知道汉字“？” 占用3个字节，英文字符“ a”占用一个字节。我的问题是：

在Java中，char占用两个字节，这里char cnc ='？'; char enc ='a'; cnc将只占用2个字节而不是3个字节吗？而“ a”需要两个字节而不是一个字节？

Answer 1

And*_*ner 5

的代码点值为\xe9\xbe\x9940857。它适合字符的两个字节。

\n\n

以 UTF-8 进行编码需要 3 个字节，因为并非所有 2 字节序列在 UTF-8 中都有效。

\n

Answer 2

Jon*_*her 1

在内部，字符串/字符是 UTF-16，因此两者都是相同的：每个字符都是 16 位。

byte[] cnBytes = cnStr.getBytes("UTF-8");

UTF-8 是一种变长编码，因此中文字符需要更多位，因为它超出了 ASCII 字符范围。

归档时间：	6 年前
查看次数：	56 次
最近记录：	6 年前