在一些文化中正则表达式和资本I

Mir*_*sik 24 c# regex

在某些文化中,资本"我"有什么问题?我发现在某些文化中无法在特殊条件下找到 - 如果您正在寻找带有标志RegexOptions.IgnoreCase的[az].这是示例代码:

var allCultures = CultureInfo.GetCultures(CultureTypes.AllCultures);
var allLetters = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789";
var allLettersCount = allLetters.Length;

foreach (var culture in allCultures)
{
    Thread.CurrentThread.CurrentCulture = culture;
    Thread.CurrentThread.CurrentUICulture = culture;

    var matched = string.Empty;
    foreach (var m in Regex.Matches(allLetters, "[A-Za-z0-9]", RegexOptions.IgnoreCase))
        matched += m;

    var count = matched.Length;
    if (count != allLettersCount)
        Console.WriteLine("Culture '{0}' - {1} missing; Matched: {2}", culture.Name, (allLettersCount - count).ToString(), matched);
}
Run Code Online (Sandbox Code Playgroud)

输出是(注意每行中缺少资本I):

Culture 'az' - 1 missing; Matched:          ABCDEFGHJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789
Culture 'az-Cyrl' - 1 missing; Matched:     ABCDEFGHJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789
Culture 'az-Cyrl-AZ' - 1 missing; Matched:  ABCDEFGHJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789
Culture 'az-Latn' - 1 missing; Matched:     ABCDEFGHJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789
Culture 'az-Latn-AZ' - 1 missing; Matched:  ABCDEFGHJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789
Culture 'tr' - 1 missing; Matched:          ABCDEFGHJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789
Culture 'tr-TR' - 1 missing; Matched:       ABCDEFGHJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789
Run Code Online (Sandbox Code Playgroud)

有趣的是,如果没有使用标志"IgnoreCase"那么它运作良好,并找到"我".

Wik*_*żew 18

答案在维基百科:

无点和点状I形式的外壳与其他语言不同.这意味着英国人所期望的不区分大小写的匹配与土耳其用户的期望不符."土耳其语I"经常被用作计算中不区分大小写的问题的一个例子.

MSDN上可以找到另一种解释:

在此输入图像描述