执行多个字符串替换的更快方法

cai*_*rnz 23 c# regex

我需要做以下事情:

    static string[] pats = { "å", "Å", "æ", "Æ", "ä", "Ä", "ö", "Ö", "ø", "Ø" ,"è", "È", "à", "À", "ì", "Ì", "õ", "Õ", "ï", "Ï" };
    static string[] repl = { "a", "A", "a", "A", "a", "A", "o", "O", "o", "O", "e", "E", "a", "A", "i", "I", "o", "O", "i", "I" };
    static int i = pats.Length;
    int j;

     // function for the replacement(s)
     public string DoRepl(string Inp) {
      string tmp = Inp;
        for( j = 0; j < i; j++ ) {
            tmp = Regex.Replace(tmp,pats[j],repl[j]);
        }
        return tmp.ToString();            
    }
    /* Main flow processes about 45000 lines of input */
Run Code Online (Sandbox Code Playgroud)

每行有6个元素通过DoRepl.大约300,000个函数调用.每个都有20个Regex.Replace,总计约600万个替换.

是否有更优雅的方式来减少传球?

Jes*_*det 21

static Dictionary<char, char> repl = new Dictionary<char, char>() { { 'å', 'a' }, { 'ø', 'o' } }; // etc...
public string DoRepl(string Inp)
{
    var tmp = Inp.Select(c =>
    {
        char r;
        if (repl.TryGetValue(c, out r))
            return r;
        return c;
    });
    return new string(tmp.ToArray());
}
Run Code Online (Sandbox Code Playgroud)

每个字符只对字典进行一次检查,如果在字典中找到则替换.


Jon*_*röm 12

这个"伎俩"怎么样?

string conv = Encoding.ASCII.GetString(Encoding.GetEncoding("Cyrillic").GetBytes(input));
Run Code Online (Sandbox Code Playgroud)


Ste*_*ger 10

没有正则表达式可能会更快.

    for( j = 0; j < i; j++ ) 
    {
        tmp = tmp.Replace(pats[j], repl[j]);
    }
Run Code Online (Sandbox Code Playgroud)

编辑

另一种方式使用ZipStringBuilder:

StringBuilder result = new StringBuilder(input);
foreach (var zipped = patterns.Zip(replacements, (p, r) => new {p, r}))
{
  result = result.Replace(zipped.p, zipped.r);
}
return result.ToString();
Run Code Online (Sandbox Code Playgroud)