考虑这个程序:
import java.util.regex.Pattern;
public class xx {
/*
* Ñ
* LATIN CAPITAL LETTER N WITH TILDE
* Unicode: U+00D1, UTF-8: C3 91
*/
public static final String BIG_N = "\u00d1";
/*
* ñ
* LATIN SMALL LETTER N WITH TILDE
* Unicode: U+00F1, UTF-8: C3 B1
*/
public static final String LITTLE_N = "\u00f1";
public static void main(String[] args) throws Exception {
System.out.println(BIG_N.equalsIgnoreCase(LITTLE_N));
System.out.println(Pattern.compile(BIG_N, Pattern.CASE_INSENSITIVE).matcher(LITTLE_N).matches());
}
}
Run Code Online (Sandbox Code Playgroud)
由于Ñ是ñ的大写版本,你可以期望它打印:
true
true
Run Code Online (Sandbox Code Playgroud)
但它实际打印的内容(java 1.7.0_17-b02)是:
true
false
Run Code Online (Sandbox Code Playgroud)
为什么?
Bri*_*ach 17
默认情况下,不区分大小写的匹配假定只匹配US-ASCII字符集中的字符.通过将UNICODE_CASE标志与此标志一起指定,可以启用Unicode感知的不区分大小写的匹配.
http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html#CASE_INSENSITIVE
为了完整; 你或(|)标志在一起.
Pattern.compile(BIG_N, Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE)
Run Code Online (Sandbox Code Playgroud)