通过codepoint读取文本流代码点

Isa*_*bie 6 java unicode

我正在尝试从Java中的文本文件中读取Unicode代码点.本InputStreamReader类返回流的内容int通过int,我希望会做我想做的,但它并不构成代理对.

我的测试程序:

import java.io.*;
import java.nio.charset.*;

class TestChars {
    public static void main(String args[]) {
        InputStreamReader reader =
            new InputStreamReader(System.in, StandardCharsets.UTF_8);
        try {
            System.out.print("> ");
            int code = reader.read();
            while (code != -1) {
                String s =
                    String.format("Code %x is `%s', %s.",
                                  code,
                                  Character.getName(code),
                                  new String(Character.toChars(code)));
                System.out.println(s);
                code = reader.read();
            }
        } catch (Exception e) {
        }
    }
}
Run Code Online (Sandbox Code Playgroud)

其行为如下:

$ java TestChars 
> keyboard ?. pizza 
Code 6b is `LATIN SMALL LETTER K', k.
Code 65 is `LATIN SMALL LETTER E', e.
Code 79 is `LATIN SMALL LETTER Y', y.
Code 62 is `LATIN SMALL LETTER B', b.
Code 6f is `LATIN SMALL LETTER O', o.
Code 61 is `LATIN SMALL LETTER A', a.
Code 72 is `LATIN SMALL LETTER R', r.
Code 64 is `LATIN SMALL LETTER D', d.
Code 20 is `SPACE',  .
Code 2328 is `KEYBOARD', ?.
Code 2e is `FULL STOP', ..
Code 20 is `SPACE',  .
Code 70 is `LATIN SMALL LETTER P', p.
Code 69 is `LATIN SMALL LETTER I', i.
Code 7a is `LATIN SMALL LETTER Z', z.
Code 7a is `LATIN SMALL LETTER Z', z.
Code 61 is `LATIN SMALL LETTER A', a.
Code 20 is `SPACE',  .
Code d83c is `HIGH SURROGATES D83C', ?.
Code df55 is `LOW SURROGATES DF55', ?.
Code a is `LINE FEED (LF)', 
.
Run Code Online (Sandbox Code Playgroud)

我的问题是组成披萨表情符号的代理对分开阅读.我想把这个符号读成一个单一的int并完成它.

问题:是否有读者(类似)类会在阅读时自动将代理对组成字符?(并且,如果输入格式错误,可能会抛出异常.)

我知道我可以自己组合对,但我宁愿避免重新发明轮子.

Sha*_*awn 6

如果您利用String返回代码点流的方法,则不必自己处理代理对:

\n\n
import java.io.*;\n\nclass cptest {\n    public static void main(String[] args) {\n        try (BufferedReader br =\n                new BufferedReader(new InputStreamReader(System.in, "UTF-8"))) {\n            br.lines().flatMapToInt(String::codePoints).forEach(cptest::print);\n        } catch (Exception e) {\n            System.err.println("Error: " + e);\n        }\n    }\n    private static void print(int cp) {\n        String s = new String(Character.toChars(cp));\n        System.out.println("Character " + cp + ": " + s);\n    }\n}\n
Run Code Online (Sandbox Code Playgroud)\n\n

将产生

\n\n
$ java cptest <<< "keyboard \xe2\x8c\xa8. pizza "\nCharacter 107: k\nCharacter 101: e\nCharacter 121: y\nCharacter 98: b\nCharacter 111: o\nCharacter 97: a\nCharacter 114: r\nCharacter 100: d\nCharacter 32:  \nCharacter 9000: \xe2\x8c\xa8\nCharacter 46: .\nCharacter 32:  \nCharacter 112: p\nCharacter 105: i\nCharacter 122: z\nCharacter 122: z\nCharacter 97: a\nCharacter 32:  \nCharacter 127829: \n
Run Code Online (Sandbox Code Playgroud)\n