HTML ASCII Case不敏感的ICU Collat​​or

ada*_*ter 7 java icu icu4j

我需要创建一个对应于https://www.w3.org/2005/xpath-functions/collat​​ion/html-ascii-case-insensitive/的Collat​​or,即在进行比较时忽略ASCII A-Za-z字符的区分大小写.

我尝试使用以下ICU4j RuleBasedCollator:

final RuleBasedCollator collator =
        new RuleBasedCollator("&a=A, b=B, c=C, d=D, e=E, f=F, g=G, h=H, "
                + "i=I, j=J, k=K, l=L, m=M, n=N, o=O, p=P, q=Q, r=R, s=S, t=T, "
                + "u=U, v=V, u=U, v=V, w=W, x=X, y=Y, z=Z").freeze();
Run Code Online (Sandbox Code Playgroud)

但是,以下比较似乎失败了,我希望它能成功(即返回true):

final SearchIterator searchIterator = new StringSearch(
        "pu", new StringCharacterIterator("iNPut"), collator);
return searchIterator.first() >= 0;
Run Code Online (Sandbox Code Playgroud)

我的规则中缺少什么?

Par*_*oob 2

com.ibm.icu.text.RuleBasedCollat​​or#compare

返回一个整数值。如果源小于目标,则值小于零;如果源和目标相等,则值为零;如果源大于目标,则值大于零

String a = "Pu";
String b = "pu";

RuleBasedCollator c1 = (RuleBasedCollator) Collator.getInstance(new Locale("en", "US", ""));
RuleBasedCollator c2 = new RuleBasedCollator("& p=P");
System.out.println(c1.compare(a, b) == 0);
System.out.println(c2.compare(a, b) == 0);
Run Code Online (Sandbox Code Playgroud)
Output
======
false
true
Run Code Online (Sandbox Code Playgroud)

看来规则并不是问题所在,SearchIterator 代码似乎有问题。


如果您不必使用 SearchIterator 那么也许您可以编写自己的“包含”方法。也许是这样的:

boolean contains(String a, String b, RuleBasedCollator c) {
  int index = 0;
  while (index < a.length()) {
    if (a.length() < b.length()) {
      return false;
    }

    if (c.compare(a.substring(0, b.length()), b) == 0) {
      return true;
    }

    a = a.substring(1);
  }
  return false;
}
Run Code Online (Sandbox Code Playgroud)

也许不是世界上最好的代码,但你明白了。