我正在尝试对一组用马其顿字母书写的字符串进行排序.我知道怎么做,但最终结果不是我的预期.这是我的测试程序:
public class Main {
private static final char[] ALPHABET_ARRAY = {
'?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?','?', '?' };
public static void main(String[] args) {
Collator collator = Collator.getInstance(new Locale("mk", "MK"));
List<String> list = new LinkedList<>();
for (int i = 0; i < ALPHABET_ARRAY.length; i++) {
list.add("" + ALPHABET_ARRAY[i]);
}
list.sort(collator::compare);
list.forEach(System.out::print);
}
}
Run Code Online (Sandbox Code Playgroud)
字母ALPHABET_ARRAY输入按正确的字母顺序排列,但程序会打印出来
абвгѓдежзѕијкќлљмнњопрстуфхцчџш
但应该是:
абвгдѓежзѕијклљмнњопрстќуфхцчџш
Java中的马其顿整理者是否存在问题,或者我做错了什么?
“mk_MK”语言环境的整理器基于sun.text.resources.mk.CollationData_mk资源(jdk8u 存储库中标记为 jdk8u92-b14 的 CollationData_mk.java 源)。
校对规则CollationData_mk清楚地放在“\xd0\xb3”之后的“\xd1\x93”和“\xd0\xba”之后的“\xd1\x9c”。
由于可以使用RuleBasedCollator自定义规则创建,因此获得所需排序顺序的最简单方法是稍微修改规则CollationData_mk:
public static Collator createMacedonianCollator() throws ParseException {\n // the defaults are defined in non-public sun.util.locale.provider.CollationRules\n // they are used internally in sun.util.locale.provider.CollatorProviderImpl\n // we have no direct access to proper defaults, so we will simply comment entries which depend on them\n String DEFAULTRULES = "";\n // we will move the entries for \xd1\x93 and \xd1\x9c only, leaving everything else as is\n return new RuleBasedCollator( DEFAULTRULES +\n //"& 9 < \\u0482 " + // thousand sign\n //"& Z " + // Arabic script sorts after Z\'s\n "< \\u0430 , \\u0410" + // a\n "< \\u0431 , \\u0411" + // be\n "< \\u0432 , \\u0412" + // ve\n "< \\u0433 , \\u0413" + // ghe\n "; \\u0491 , \\u0490" + // ghe-upturn\n "; \\u0495 , \\u0494" + // ghe-mid-hook\n /*!!!moved after \xd0\xb4/de!!!*/ //"; \\u0453 , \\u0403" + // gje\n "; \\u0493 , \\u0492" + // ghe-stroke\n "< \\u0434 , \\u0414" + // de\n /*!!!moved AND relation strength changed!!!*/ "< \\u0453 , \\u0403" + // gje\n "< \\u0452 , \\u0402" + // dje\n "< \\u0435 , \\u0415" + // ie\n "; \\u04bd , \\u04bc" + // che\n "; \\u0451 , \\u0401" + // io\n "; \\u04bf , \\u04be" + // che-descender\n "< \\u0454 , \\u0404" + // uk ie\n "< \\u0436 , \\u0416" + // zhe\n "; \\u0497 , \\u0496" + // zhe-descender\n "; \\u04c2 , \\u04c1" + // zhe-breve\n "< \\u0437 , \\u0417" + // ze\n "; \\u0499 , \\u0498" + // zh-descender\n "< \\u0455 , \\u0405" + // dze\n "< \\u0438 , \\u0418" + // i\n "< \\u0456 , \\u0406" + // uk/bg i\n "; \\u04c0 " + // palochka\n "< \\u0457 , \\u0407" + // uk yi\n "< \\u0439 , \\u0419" + // short i\n "< \\u0458 , \\u0408" + // je\n "< \\u043a , \\u041a" + // ka\n "; \\u049f , \\u049e" + // ka-stroke\n "; \\u04c4 , \\u04c3" + // ka-hook\n "; \\u049d , \\u049c" + // ka-vt-stroke\n "; \\u04a1 , \\u04a0" + // bashkir-ka\n /*!!!moved after \xd1\x82/te!!!*/ //"; \\u045c , \\u040c" + // kje\n "; \\u049b , \\u049a" + // ka-descender\n "< \\u043b , \\u041b" + // el\n "< \\u0459 , \\u0409" + // lje\n "< \\u043c , \\u041c" + // em\n "< \\u043d , \\u041d" + // en\n "; \\u0463 " + // yat\n "; \\u04a3 , \\u04a2" + // en-descender\n "; \\u04a5 , \\u04a4" + // en-ghe\n "; \\u04bb , \\u04ba" + // shha\n "; \\u04c8 , \\u04c7" + // en-hook\n "< \\u045a , \\u040a" + // nje\n "< \\u043e , \\u041e" + // o\n "; \\u04a9 , \\u04a8" + // ha\n "< \\u043f , \\u041f" + // pe\n "; \\u04a7 , \\u04a6" + // pe-mid-hook\n "< \\u0440 , \\u0420" + // er\n "< \\u0441 , \\u0421" + // es\n "; \\u04ab , \\u04aa" + // es-descender\n "< \\u0442 , \\u0422" + // te\n "; \\u04ad , \\u04ac" + // te-descender\n "< \\u045b , \\u040b" + // tshe\n /*!!!movedAND relation strength changed!!!*/ "< \\u045c , \\u040c" + // kje\n "< \\u0443 , \\u0423" + // u\n "; \\u04af , \\u04ae" + // straight u\n "< \\u045e , \\u040e" + // short u\n "< \\u04b1 , \\u04b0" + // straight u-stroke\n "< \\u0444 , \\u0424" + // ef\n "< \\u0445 , \\u0425" + // ha\n "; \\u04b3 , \\u04b2" + // ha-descender\n "< \\u0446 , \\u0426" + // tse\n "; \\u04b5 , \\u04b4" + // te tse\n "< \\u0447 , \\u0427" + // che\n "; \\u04b7 ; \\u04b6" + // che-descender\n "; \\u04b9 , \\u04b8" + // che-vt-stroke\n "; \\u04cc , \\u04cb" + // che\n "< \\u045f , \\u040f" + // dzhe\n "< \\u0448 , \\u0428" + // sha\n "< \\u0449 , \\u0429" + // shcha\n "< \\u044a , \\u042a" + // hard sign\n "< \\u044b , \\u042b" + // yeru\n "< \\u044c , \\u042c" + // soft sign\n "< \\u044d , \\u042d" + // e\n "< \\u044e , \\u042e" + // yu\n "< \\u044f , \\u042f" + // ya\n "< \\u0461 , \\u0460" + // omega\n "< \\u0462 " + // yat\n "< \\u0465 , \\u0464" + // iotified e\n "< \\u0467 , \\u0466" + // little yus\n "< \\u0469 , \\u0468" + // iotified little yus\n "< \\u046b , \\u046a" + // big yus\n "< \\u046d , \\u046c" + // iotified big yus\n "< \\u046f , \\u046e" + // ksi\n "< \\u0471 , \\u0470" + // psi\n "< \\u0473 , \\u0472" + // fita\n "< \\u0475 , \\u0474" + // izhitsa\n "; \\u0477 , \\u0476" + // izhitsa-double-grave\n "< \\u0479 , \\u0478" + // uk\n "< \\u047b , \\u047a" + // round omega\n "< \\u047d , \\u047c" + // omega-titlo\n "< \\u047f , \\u047e" + // ot\n "< \\u0481 , \\u0480" // koppa\n );\n}\nRun Code Online (Sandbox Code Playgroud)\n\n规则可以进一步简化为仅包含 31 个基本字母,不带重音变体。
\n| 归档时间: |
|
| 查看次数: |
169 次 |
| 最近记录: |