使用整理排序马其顿字母表

Dav*_*ove 10 java collation

我正在尝试对一组用马其顿字母书写的字符串进行排序.我知道怎么做,但最终结果不是我的预期.这是我的测试程序:

public class Main {

    private static final char[] ALPHABET_ARRAY = {
        '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?','?', '?' };

    public static void main(String[] args) {
        Collator collator = Collator.getInstance(new Locale("mk", "MK"));
        List<String> list = new LinkedList<>();
        for (int i = 0; i < ALPHABET_ARRAY.length; i++) {
            list.add("" + ALPHABET_ARRAY[i]);
        }
        list.sort(collator::compare);
        list.forEach(System.out::print);
    }
}
Run Code Online (Sandbox Code Playgroud)

字母ALPHABET_ARRAY输入按正确的字母顺序排列,但程序会打印出来

абвгѓдежзѕијкќлљмнњопрстуфхцчџш

但应该是:

абвгдѓежзѕијклљмнњопрстќуфхцчџш

Java中的马其顿整理者是否存在问题,或者我做错了什么?

Ole*_*hin 4

“mk_MK”语言环境的整理器基于sun.text.resources.mk.CollationData_mk资源(jdk8u 存储库中标记为 jdk8u92-b14 的 Collat​​ionData_mk.java 源)。

\n\n

校对规则CollationData_mk清楚地放在“\xd0\xb3”之后的“\xd1\x93”和“\xd0\xba”之后的“\xd1\x9c”。

\n\n

由于可以使用RuleBasedCollator自定义规则创建,因此获得所需排序顺序的最简单方法是稍微修改规则CollationData_mk

\n\n
public static Collator createMacedonianCollator() throws ParseException {\n    // the defaults are defined in non-public sun.util.locale.provider.CollationRules\n    // they are used internally in sun.util.locale.provider.CollatorProviderImpl\n    // we have no direct access to proper defaults, so we will simply comment entries which depend on them\n    String DEFAULTRULES = "";\n    // we will move the entries for \xd1\x93 and \xd1\x9c only, leaving everything else as is\n    return new RuleBasedCollator( DEFAULTRULES +\n            //"& 9 < \\u0482 " +       // thousand sign\n            //"& Z " +                // Arabic script sorts after Z\'s\n            "< \\u0430 , \\u0410" +   // a\n            "< \\u0431 , \\u0411" +   // be\n            "< \\u0432 , \\u0412" +   // ve\n            "< \\u0433 , \\u0413" +   // ghe\n            "; \\u0491 , \\u0490" +   // ghe-upturn\n            "; \\u0495 , \\u0494" +   // ghe-mid-hook\n            /*!!!moved after \xd0\xb4/de!!!*/ //"; \\u0453 , \\u0403" +   // gje\n            "; \\u0493 , \\u0492" +   // ghe-stroke\n            "< \\u0434 , \\u0414" +   // de\n            /*!!!moved AND relation strength changed!!!*/ "< \\u0453 , \\u0403" +   // gje\n            "< \\u0452 , \\u0402" +   // dje\n            "< \\u0435 , \\u0415" +   // ie\n            "; \\u04bd , \\u04bc" +   // che\n            "; \\u0451 , \\u0401" +   // io\n            "; \\u04bf , \\u04be" +   // che-descender\n            "< \\u0454 , \\u0404" +   // uk ie\n            "< \\u0436 , \\u0416" +   // zhe\n            "; \\u0497 , \\u0496" +   // zhe-descender\n            "; \\u04c2 , \\u04c1" +   // zhe-breve\n            "< \\u0437 , \\u0417" +   // ze\n            "; \\u0499 , \\u0498" +   // zh-descender\n            "< \\u0455 , \\u0405" +   // dze\n            "< \\u0438 , \\u0418" +   // i\n            "< \\u0456 , \\u0406" +   // uk/bg i\n            "; \\u04c0 " +           // palochka\n            "< \\u0457 , \\u0407" +   // uk yi\n            "< \\u0439 , \\u0419" +   // short i\n            "< \\u0458 , \\u0408" +   // je\n            "< \\u043a , \\u041a" +   // ka\n            "; \\u049f , \\u049e" +   // ka-stroke\n            "; \\u04c4 , \\u04c3" +   // ka-hook\n            "; \\u049d , \\u049c" +   // ka-vt-stroke\n            "; \\u04a1 , \\u04a0" +   // bashkir-ka\n            /*!!!moved after \xd1\x82/te!!!*/ //"; \\u045c , \\u040c" +   // kje\n            "; \\u049b , \\u049a" +   // ka-descender\n            "< \\u043b , \\u041b" +   // el\n            "< \\u0459 , \\u0409" +   // lje\n            "< \\u043c , \\u041c" +   // em\n            "< \\u043d , \\u041d" +   // en\n            "; \\u0463 " +           // yat\n            "; \\u04a3 , \\u04a2" +   // en-descender\n            "; \\u04a5 , \\u04a4" +   // en-ghe\n            "; \\u04bb , \\u04ba" +   // shha\n            "; \\u04c8 , \\u04c7" +   // en-hook\n            "< \\u045a , \\u040a" +   // nje\n            "< \\u043e , \\u041e" +   // o\n            "; \\u04a9 , \\u04a8" +   // ha\n            "< \\u043f , \\u041f" +   // pe\n            "; \\u04a7 , \\u04a6" +   // pe-mid-hook\n            "< \\u0440 , \\u0420" +   // er\n            "< \\u0441 , \\u0421" +   // es\n            "; \\u04ab , \\u04aa" +   // es-descender\n            "< \\u0442 , \\u0422" +   // te\n            "; \\u04ad , \\u04ac" +   // te-descender\n            "< \\u045b , \\u040b" +   // tshe\n            /*!!!movedAND relation strength changed!!!*/ "< \\u045c , \\u040c" +   // kje\n            "< \\u0443 , \\u0423" +   // u\n            "; \\u04af , \\u04ae" +   // straight u\n            "< \\u045e , \\u040e" +   // short u\n            "< \\u04b1 , \\u04b0" +   // straight u-stroke\n            "< \\u0444 , \\u0424" +   // ef\n            "< \\u0445 , \\u0425" +   // ha\n            "; \\u04b3 , \\u04b2" +   // ha-descender\n            "< \\u0446 , \\u0426" +   // tse\n            "; \\u04b5 , \\u04b4" +   // te tse\n            "< \\u0447 , \\u0427" +   // che\n            "; \\u04b7 ; \\u04b6" +   // che-descender\n            "; \\u04b9 , \\u04b8" +   // che-vt-stroke\n            "; \\u04cc , \\u04cb" +   // che\n            "< \\u045f , \\u040f" +   // dzhe\n            "< \\u0448 , \\u0428" +   // sha\n            "< \\u0449 , \\u0429" +   // shcha\n            "< \\u044a , \\u042a" +   // hard sign\n            "< \\u044b , \\u042b" +   // yeru\n            "< \\u044c , \\u042c" +   // soft sign\n            "< \\u044d , \\u042d" +   // e\n            "< \\u044e , \\u042e" +   // yu\n            "< \\u044f , \\u042f" +   // ya\n            "< \\u0461 , \\u0460" +   // omega\n            "< \\u0462 " +           // yat\n            "< \\u0465 , \\u0464" +   // iotified e\n            "< \\u0467 , \\u0466" +   // little yus\n            "< \\u0469 , \\u0468" +   // iotified little yus\n            "< \\u046b , \\u046a" +   // big yus\n            "< \\u046d , \\u046c" +   // iotified big yus\n            "< \\u046f , \\u046e" +   // ksi\n            "< \\u0471 , \\u0470" +   // psi\n            "< \\u0473 , \\u0472" +   // fita\n            "< \\u0475 , \\u0474" +   // izhitsa\n            "; \\u0477 , \\u0476" +   // izhitsa-double-grave\n            "< \\u0479 , \\u0478" +   // uk\n            "< \\u047b , \\u047a" +   // round omega\n            "< \\u047d , \\u047c" +   // omega-titlo\n            "< \\u047f , \\u047e" +   // ot\n            "< \\u0481 , \\u0480"     // koppa\n    );\n}\n
Run Code Online (Sandbox Code Playgroud)\n\n

规则可以进一步简化为仅包含 31 个基本字母,不带重音变体。

\n