我想在Java中表示一个Unicode字符.哪个原始或类适合这个?
请注意,我希望能够存储任何unicode字符,这可能对于2字节而言太大char.
T.J*_*der 11
char确实是16位,a char对应于UTF-16 代码单元.不适合单个UTF-16代码单元(例如Emojis)的字符需要两个chars.
如果由于某种原因需要单独存储它们,可以使用int它.对于Unicode中当前允许的所有0x10FFFF代码点,它有足够的空间(然后是一些空间).这就是JDK使用的内容,例如in Character.codePointAt(CharSequence seq, int index)和String(int[] codePoints, int offset, int count).
无偿转换示例(生活在ideone上):
String s = "";
int emoji = Character.codePointAt(s, 0);
String unumber = "U+" + Integer.toHexString(emoji).toUpperCase();
System.out.println(s + " is code point " + unumber);
String s2 = new String(new int[] { emoji }, 0, 1);
System.out.println("Code point " + unumber + " converted back to string: " + s2);
System.out.println("Successful round-trip? " + s.equals(s2));
Run Code Online (Sandbox Code Playgroud)
哪个输出:
is code point U+1F602 Code point U+1F602 converted back to string: Successful round-trip? true