FirstUnmatchedIndex使用CurrentCultureIgnoreCase

mad*_*ang 5 c# string

我需要支持在输入文本中使用非ascii字母的语言,所以我需要为FirstUnmatchedIndex实现StringComparison.CurrentCultureIgnoreCase.忽略套管并不是那么糟糕,但我不知道如何将组合符号转换为标准表示然后进行比较.所以这里有一些情况,函数应返回-1但返回别的东西....

encyclopædia = encyclopaedia
Archæology = Archaeology
ARCHÆOLOGY = archaeology
Archæology = archaeology
Weißbier = WEISSBIER
Run Code Online (Sandbox Code Playgroud)

如何知道char是否需要扩展并在需要时将每个char转换为扩展形式?

/// <summary>
/// Gets a first different char occurence index
/// </summary>
/// <param name="a">First string</param>
/// <param name="b">Second string</param>
/// <param name="compareSmallest">
/// If true, Returns the first difference found or -1 if the end of a string is reached without finding a difference.
/// IE, Return -1 if the smallest string is contained in the other.
/// Otherwise returns -1 only if both string are really the same and will return the position where the smallest string ends if no difference is found.
/// </param>
/// <returns>
/// Returns first difference index or -1 if no difference is found
/// </returns>
public static int FirstUnmatchedIndex(this string a, string b, bool compareSmallest = false, StringComparison comparisonType = StringComparison.CurrentCulture)
{
    //Treat null as empty
    if (String.IsNullOrEmpty(a)) {
        if (String.IsNullOrEmpty(b)) {
            //Equal, both empty.
            return -1;
        } else {
            //If compareSmallest, empty is always found in longest.
            //Otherwise, difference at pos 0.
            return compareSmallest ? -1 : 0;
        }
    }
    if (object.ReferenceEquals(a, b)) {
        //Same Ref.
        return -1;
    }

    //Convert strings before compare.
    switch (comparisonType) {
        case StringComparison.CurrentCulture:
            //FIXME
            break;
        case StringComparison.CurrentCultureIgnoreCase:
            //FIXME
            var currentCulture = System.Globalization.CultureInfo.CurrentCulture;
            a = a.ToLower(currentCulture);
            b = b.ToLower(currentCulture);
            break;
        case StringComparison.InvariantCulture:
            //FIXME
            break;
        case StringComparison.InvariantCultureIgnoreCase:
            //FIXME
            a = a.ToLowerInvariant();
            b = b.ToLowerInvariant();
            break;
        case StringComparison.OrdinalIgnoreCase:
            a = a.ToLower();
            b = b.ToLower();
            break;
        case StringComparison.Ordinal:
            //Ordinal(Binary) comprare, nothing special to do.
        default:
            break;
    }

    string longStr = a.Length > b.Length ? a : b;
    string shortStr = a.Length > b.Length ? b : a;

    int count = shortStr.Length;
    for (int idx = 0; idx < count; idx++) {
        //FIXME Check if char needs to be expanded ?
        if (shortStr[idx] != longStr[idx]) {
            return idx;
        }
    }
    return compareSmallest || longStr.Length == count ? -1 : count;
}
Run Code Online (Sandbox Code Playgroud)

Joh*_*nyL 0

我不确定我是否正确理解你的问题,但你可以使用“字典+正则表达式”组合。这个想法是用你想要扩展的字符创建字典,并在正则表达式的帮助下找到它们。以下代码示例展示了如何执行此操作。

\n\n

正则表达式解释:

\n\n
    \n
  • (?i)- 这将启用不区分大小写的搜索(与\n相同RegexOptions.IgnoreCase,但内联)
  • \n
  • [^\\p{IsBasicLatin}]+-\n这将搜索不适合基本拉丁\n字符集的所有字符(从\\u0000\\u007F)。
  • \n
\n\n

该代码使用ToLower避免向字典添加大写非拉丁字符的方法。当然,如果您想明确的话,您可以不这样做(即将所有小写和大写字符添加到字典中并删除ToLower)。

\n\n
var dic = new Dictionary<string, string>\n{\n    ["\xc3\xa6"] = "ae",\n    ["\xc3\x9f"] = "ss"\n};\n\nvar words = new[] { "encyclop\xc3\xa6dia", "Arch\xc3\xa6ology", "ARCH\xc3\x86OLOGY", "Arch\xc3\xa6ology", "Wei\xc3\x9fbier" };\nvar pattern = @"(?i)[^\\p{IsBasicLatin}]+";\n\nint x = -1;\nforeach(var word in words)\n{\n    // Each match (m.Value) is passed to dictionary \n    words[++x] = Regex.Replace(word, pattern, m => dic[m.Value.ToLower()]);\n}\nwords.ToList().ForEach(WriteLine);\n\n/*\n    Output:\n        encyclopaedia\n        Archaeology\n        ARCHaeOLOGY\n        Archaeology\n        Weissbier\n*/\n
Run Code Online (Sandbox Code Playgroud)\n