使用代理对创建和使用字符串

pet*_*ust 4 java string unicode

我必须使用上面的代码点0FFFF(特别是数学脚本字符),并且没有找到关于如何执行此操作的简单教程.我希望能够(a)创建String具有高代码点的s和(b)迭代其中的字符.由于char无法保持这些点,我的代码如下:

    @Test
public void testSurrogates() throws IOException {
    // creating a string
    StringBuffer sb = new StringBuffer();
    sb.append("a");
    sb.appendCodePoint(120030);
    sb.append("b");
    String s = sb.toString();
    System.out.println("s> "+s+" "+s.length());
    // iterating over string
    int codePointCount = s.codePointCount(0, s.length());
    Assert.assertEquals(3, codePointCount);
    int charIndex = 0;
    for (int i = 0; i < codePointCount; i++) {
        int codepoint = s.codePointAt(charIndex);
        int charCount = Character.charCount(codepoint);
        System.out.println(codepoint+" "+charCount);
        charIndex += charCount;
    }
}
Run Code Online (Sandbox Code Playgroud)

我觉得这完全正确或最干净的方式让我感到不舒服.我本来期望的方法,codePointAfter()但只有一个codePointBefore().请确认这是正确的策略或提供替代策略.

更新:感谢@Jon的确认.我为此苦苦挣扎 - 这是两个要避免的错误:

  • 代码点没有直接索引(即没有s.getCodePoint(i))- 你必须遍历它们
  • 使用(char)作为强制转换将截断上面的整数,0FFFF并且不容易发现

Jon*_*eet 5

它对我来说是正确的.如果要迭代字符串中的代码点,可以将此代码包装在Iterable:

public static Iterable<Integer> getCodePoints(final String text) {
    return new Iterable<Integer>() {
        @Override public Iterator<Integer> iterator() {
            return new Iterator<Integer>() {
                private int nextIndex = 0;

                @Override public boolean hasNext() {
                    return nextIndex < text.length();
                }

                @Override public Integer next() {
                    if (!hasNext()) {
                        throw new NoSuchElementException();
                    }
                    int codePoint = text.codePointAt(nextIndex);
                    nextIndex += Character.charCount(codePoint);
                    return codePoint;
                }

                @Override public void remove() {
                    throw new UnsupportedOperationException();
                }
            };
        }
    };
}
Run Code Online (Sandbox Code Playgroud)

或者您可以将方法更改为仅返回int[]当然:

public static int[] getCodePoints(String text) {
    int[] ret = new int[text.codePointCount(0, text.length())];
    int charIndex = 0;
    for (int i = 0; i < ret.length; i++) {
        ret[i] = text.codePointAt(charIndex);
        charIndex += Character.charCount(ret[i]);
    }
    return ret;
}
Run Code Online (Sandbox Code Playgroud)

我同意这是一个遗憾的是,Java库不公开这样的方法了,但至少他们不是硬写.