更换字符串中的多个字符,最快的方法是什么?

Dej*_*vić 26 c# regex string performance replace

我正在导入一些具有多个string字段的记录,从旧数据库到新数据库.它看起来很慢,我怀疑是因为我这样做:

foreach (var oldObj in oldDB)
{
    NewObject newObj = new NewObject();
    newObj.Name = oldObj.Name.Trim().Replace('^', '?').Replace('@', 'Ž').Replace('[', 'Š')
        .Replace(']', '?').Replace('`', 'ž').Replace('}', '?')
        .Replace('~', '?').Replace('{', 'š').Replace('\\', '?');
    newObj.Surname = oldObj.Surname.Trim().Replace('^', '?').Replace('@', 'Ž').Replace('[', 'Š')
        .Replace(']', '?').Replace('`', 'ž').Replace('}', '?')
        .Replace('~', '?').Replace('{', 'š').Replace('\\', '?');
    newObj.Address = oldObj.Address.Trim().Replace('^', '?').Replace('@', 'Ž').Replace('[', 'Š')
        .Replace(']', '?').Replace('`', 'ž').Replace('}', '?')
        .Replace('~', '?').Replace('{', 'š').Replace('\\', '?');
    newObj.Note = oldObj.Note.Trim().Replace('^', '?').Replace('@', 'Ž').Replace('[', 'Š')
        .Replace(']', '?').Replace('`', 'ž').Replace('}', '?')
        .Replace('~', '?').Replace('{', 'š').Replace('\\', '?');
    /*
    ... some processing ...
    */
}
Run Code Online (Sandbox Code Playgroud)

现在,我已经通过网络阅读了一些帖子和文章,我已经看到了很多不同的想法.有人说如果我做正则表达式会更好MatchEvaluator,有人说最好保留原样.

虽然我可能更容易为自己做一个基准案例,但我决定在这里提出一个问题,以防其他人一直想知道同一个问题,或者有人提前知道.

那么在C#中执行此操作的最快方法是什么?

编辑

我在这里发布了基准.乍一看,理查德的方式可能是最快的.然而,他的方式,也不是马克的,因为错误的正则表达式模式会做任何事情.修正后的图案

@"\^@\[\]`\}~\{\\" 
Run Code Online (Sandbox Code Playgroud)

@"\^|@|\[|\]|`|\}|~|\{|\\" 
Run Code Online (Sandbox Code Playgroud)

似乎用链式.Replace()调用的旧方法毕竟是最快的

Dej*_*vić 27

谢谢你的投入.我写了一个快速而肮脏的基准来测试你的输入.我已经测试了解析4个字符串500.000次迭代并完成了4次传递.结果如下:

*** Pass 1
Old (Chained String.Replace()) way completed in 814 ms
logicnp (ToCharArray) way completed in 916 ms
oleksii (StringBuilder) way completed in 943 ms
André Christoffer Andersen (Lambda w/ Aggregate) way completed in 2551 ms
Richard (Regex w/ MatchEvaluator) way completed in 215 ms
Marc Gravell (Static Regex) way completed in 1008 ms

*** Pass 2
Old (Chained String.Replace()) way completed in 786 ms
logicnp (ToCharArray) way completed in 920 ms
oleksii (StringBuilder) way completed in 905 ms
André Christoffer Andersen (Lambda w/ Aggregate) way completed in 2515 ms
Richard (Regex w/ MatchEvaluator) way completed in 217 ms
Marc Gravell (Static Regex) way completed in 1025 ms

*** Pass 3
Old (Chained String.Replace()) way completed in 775 ms
logicnp (ToCharArray) way completed in 903 ms
oleksii (StringBuilder) way completed in 931 ms
André Christoffer Andersen (Lambda w/ Aggregate) way completed in 2529 ms
Richard (Regex w/ MatchEvaluator) way completed in 214 ms
Marc Gravell (Static Regex) way completed in 1022 ms

*** Pass 4
Old (Chained String.Replace()) way completed in 799 ms
logicnp (ToCharArray) way completed in 908 ms
oleksii (StringBuilder) way completed in 938 ms
André Christoffer Andersen (Lambda w/ Aggregate) way completed in 2592 ms
Richard (Regex w/ MatchEvaluator) way completed in 225 ms
Marc Gravell (Static Regex) way completed in 1050 ms

该基准的代码如下.请查看代码并确认@Richard有最快的方法.请注意,我没有检查输出是否正确,我认为它们是.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Diagnostics;
using System.Text.RegularExpressions;

namespace StringReplaceTest
{
    class Program
    {
        static string test1 = "A^@[BCD";
        static string test2 = "E]FGH\\";
        static string test3 = "ijk`l}m";
        static string test4 = "nopq~{r";

        static readonly Dictionary<char, string> repl =
            new Dictionary<char, string> 
            { 
                {'^', "?"}, {'@', "Ž"}, {'[', "Š"}, {']', "?"}, {'`', "ž"}, {'}', "?"}, {'~', "?"}, {'{', "š"}, {'\\', "?"} 
            };

        static readonly Regex replaceRegex;

        static Program() // static initializer 
        {
            StringBuilder pattern = new StringBuilder().Append('[');
            foreach (var key in repl.Keys)
                pattern.Append(Regex.Escape(key.ToString()));
            pattern.Append(']');
            replaceRegex = new Regex(pattern.ToString(), RegexOptions.Compiled);
        }

        public static string Sanitize(string input)
        {
            return replaceRegex.Replace(input, match =>
            {
                return repl[match.Value[0]];
            });
        } 

        static string DoGeneralReplace(string input) 
        { 
            var sb = new StringBuilder(input);
            return sb.Replace('^', '?').Replace('@', 'Ž').Replace('[', 'Š').Replace(']', '?').Replace('`', 'ž').Replace('}', '?').Replace('~', '?').Replace('{', 'š').Replace('\\', '?').ToString(); 
        }

        //Method for replacing chars with a mapping 
        static string Replace(string input, IDictionary<char, char> replacementMap)
        {
            return replacementMap.Keys
                .Aggregate(input, (current, oldChar)
                    => current.Replace(oldChar, replacementMap[oldChar]));
        } 

        static void Main(string[] args)
        {
            for (int i = 1; i < 5; i++)
                DoIt(i);
        }

        static void DoIt(int n)
        {
            Stopwatch sw = new Stopwatch();
            int idx = 0;

            Console.WriteLine("*** Pass " + n.ToString());
            // old way
            sw.Start();
            for (idx = 0; idx < 500000; idx++)
            {
                string result1 = test1.Replace('^', '?').Replace('@', 'Ž').Replace('[', 'Š').Replace(']', '?').Replace('`', 'ž').Replace('}', '?').Replace('~', '?').Replace('{', 'š').Replace('\\', '?');
                string result2 = test2.Replace('^', '?').Replace('@', 'Ž').Replace('[', 'Š').Replace(']', '?').Replace('`', 'ž').Replace('}', '?').Replace('~', '?').Replace('{', 'š').Replace('\\', '?');
                string result3 = test3.Replace('^', '?').Replace('@', 'Ž').Replace('[', 'Š').Replace(']', '?').Replace('`', 'ž').Replace('}', '?').Replace('~', '?').Replace('{', 'š').Replace('\\', '?');
                string result4 = test4.Replace('^', '?').Replace('@', 'Ž').Replace('[', 'Š').Replace(']', '?').Replace('`', 'ž').Replace('}', '?').Replace('~', '?').Replace('{', 'š').Replace('\\', '?');
            }
            sw.Stop();
            Console.WriteLine("Old (Chained String.Replace()) way completed in " + sw.ElapsedMilliseconds.ToString() + " ms");

            Dictionary<char, char> replacements = new Dictionary<char, char>();
            replacements.Add('^', '?');
            replacements.Add('@', 'Ž');
            replacements.Add('[', 'Š');
            replacements.Add(']', '?');
            replacements.Add('`', 'ž');
            replacements.Add('}', '?');
            replacements.Add('~', '?');
            replacements.Add('{', 'š');
            replacements.Add('\\', '?');

            // logicnp way
            sw.Reset();
            sw.Start();
            for (idx = 0; idx < 500000; idx++)
            {
                char[] charArray1 = test1.ToCharArray();
                for (int i = 0; i < charArray1.Length; i++)
                {
                    char newChar;
                    if (replacements.TryGetValue(test1[i], out newChar))
                        charArray1[i] = newChar;
                }
                string result1 = new string(charArray1);

                char[] charArray2 = test2.ToCharArray();
                for (int i = 0; i < charArray2.Length; i++)
                {
                    char newChar;
                    if (replacements.TryGetValue(test2[i], out newChar))
                        charArray2[i] = newChar;
                }
                string result2 = new string(charArray2);

                char[] charArray3 = test3.ToCharArray();
                for (int i = 0; i < charArray3.Length; i++)
                {
                    char newChar;
                    if (replacements.TryGetValue(test3[i], out newChar))
                        charArray3[i] = newChar;
                }
                string result3 = new string(charArray3);

                char[] charArray4 = test4.ToCharArray();
                for (int i = 0; i < charArray4.Length; i++)
                {
                    char newChar;
                    if (replacements.TryGetValue(test4[i], out newChar))
                        charArray4[i] = newChar;
                }
                string result4 = new string(charArray4);
            }
            sw.Stop();
            Console.WriteLine("logicnp (ToCharArray) way completed in " + sw.ElapsedMilliseconds.ToString() + " ms");

            // oleksii way
            sw.Reset();
            sw.Start();
            for (idx = 0; idx < 500000; idx++)
            {
                string result1 = DoGeneralReplace(test1);
                string result2 = DoGeneralReplace(test2);
                string result3 = DoGeneralReplace(test3);
                string result4 = DoGeneralReplace(test4);
            }
            sw.Stop();
            Console.WriteLine("oleksii (StringBuilder) way completed in " + sw.ElapsedMilliseconds.ToString() + " ms");

            // André Christoffer Andersen way
            sw.Reset();
            sw.Start();
            for (idx = 0; idx < 500000; idx++)
            {
                string result1 = Replace(test1, replacements);
                string result2 = Replace(test2, replacements);
                string result3 = Replace(test3, replacements);
                string result4 = Replace(test4, replacements);
            }
            sw.Stop();
            Console.WriteLine("André Christoffer Andersen (Lambda w/ Aggregate) way completed in " + sw.ElapsedMilliseconds.ToString() + " ms");

            // Richard way
            sw.Reset();
            sw.Start();
            Regex reg = new Regex(@"\^|@|\[|\]|`|\}|~|\{|\\");
            MatchEvaluator eval = match =>
            {
                switch (match.Value)
                {
                    case "^": return "?";
                    case "@": return "Ž";
                    case "[": return "Š";
                    case "]": return "?";
                    case "`": return "ž";
                    case "}": return "?";
                    case "~": return "?";
                    case "{": return "š";
                    case "\\": return "?";
                    default: throw new Exception("Unexpected match!");
                }
            };
            for (idx = 0; idx < 500000; idx++)
            {
                string result1 = reg.Replace(test1, eval);
                string result2 = reg.Replace(test2, eval);
                string result3 = reg.Replace(test3, eval);
                string result4 = reg.Replace(test4, eval);
            }
            sw.Stop();
            Console.WriteLine("Richard (Regex w/ MatchEvaluator) way completed in " + sw.ElapsedMilliseconds.ToString() + " ms");

            // Marc Gravell way
            sw.Reset();
            sw.Start();
            for (idx = 0; idx < 500000; idx++)
            {
                string result1 = Sanitize(test1);
                string result2 = Sanitize(test2);
                string result3 = Sanitize(test3);
                string result4 = Sanitize(test4);
            }
            sw.Stop();
            Console.WriteLine("Marc Gravell (Static Regex) way completed in " + sw.ElapsedMilliseconds.ToString() + " ms\n");
        }
    }
}
Run Code Online (Sandbox Code Playgroud)

  • "正则表达式"更快,这并不奇怪.它的构建是为了以极其高效的方式搜索字符串.永远记住,仪器的法则是坏的 - 利用为你想要做的事情而构建的技术,所以不要害怕使用正则表达式.C#并不擅长一切因为它有一个API.好的问题和良好的基准@Dejan. (2认同)
  • 我也会添加一件事 - 你的测试字符串很短.虽然这可能是你的真实数据的情况下(在这种情况下,你的标杆仅仅是罚款,并当场上),它会歪斜的结果长字符串,与不同数量的字符来代替等.我怀疑这是负责的比较好的string.Replace的性能 - 它确实一遍又一遍地创建字符串(虽然只有在某些内容发生变化时),但是循环和字符串结果都非常小,所以它不会花费太多.长串的差异会更明显. (2认同)
  • 我认为Regex方法是最快的,因为模式在字符串的开头缺少'[',在字符串的末尾缺少']'.显示样本后,没有替换,因为我们没有匹配!我认为两种正则表达式方法之间的巨大差异,可以简单地解释一下. (2认同)
  • 我在3个版本(差异长度和替换)中运行此测试,logicnp总是最好的,其次是oleksii.他甚至在我的原始测试中表现最好?.NET 4.5,Release,Win7,i7 (2认同)
  • 总结一下:原始数字显然没有反映出正则数据被打破的事实.我重新运行测试(添加我提出的方法),添加验证结果是正确的.然后我拿出验证并重新运行它们.我还运行了MatchEvaluator方法,使用相同的静态正则表达式,在测试运行中创建的正则表达式,以及使用Regex.Replace(input,pattern,eval).MatchEvaluator方法始终是最慢的.鉴于测试数据,链式StringBuilder.Replace始终是最快的,然后是ToCharArray,string.Replace链,我的方法,Marc's,然后是Richard's (2认同)

Ric*_*ard 15

最快的方式

唯一的方法是自己比较性能.尝试在Q中,使用StringBuilder和也Regex.Replace.

但微基准测试不考虑整个系统的范围.如果这种方法只是整个系统的一小部分,那么它的性能可能与整个应用程序的性能无关.

一些说明:

  1. 使用String上面(我假设)将创建许多中间字符串:更多的GC工作.但这很简单.
  2. 使用StringBuilder允许每次替换修改相同的基础数据.这会减少垃圾.它几乎和使用一样简单String.
  3. 使用a regex是最复杂的(因为您需要使用代码来计算替换),但允许使用单个表达式.我希望这个更慢,除非替换列表非常大并且输入字符串中的替换很少(即大多数替换方法调用都不会替换任何东西,只需要通过字符串搜索成本).

由于较少的GC负载,我预计#2会因重复使用(数千次)而稍快一些.

对于正则表达式方法,您需要以下内容:

newObj.Name = Regex.Replace(oldObj.Name.Trim(), @"[@^\[\]`}~{\\]", match => {
  switch (match.Value) {
    case "^": return "?";
    case "@": return "Ž";
    case "[": return "Š";
    case "]": return "?";
    case "`": return "ž";
    case "}": return "?";
    case "~": return "?";
    case "{": return "š";
    case "\\": return "?";
    default: throw new Exception("Unexpected match!");
  }
});
Run Code Online (Sandbox Code Playgroud)

这可以通过参数化a Dictionary<char,char>以保持替换和可重用的方式以可重用的方式完成MatchEvaluator.


log*_*cnp 9

试试这个:

Dictionary<char, char> replacements = new Dictionary<char, char>();
// populate replacements

string str = "mystring";
char []charArray = str.ToCharArray();

for (int i = 0; i < charArray.Length; i++)
{
    char newChar;
    if (replacements.TryGetValue(str[i], out newChar))
    charArray[i] = newChar;
}

string newStr = new string(charArray);
Run Code Online (Sandbox Code Playgroud)

  • @Steve - IndexOfAny也将在内部使用循环.没有办法避免单个循环. (2认同)

ole*_*sii 6

一种可能的解决方案是使用一个StringBuilder类.

您可以先将代码重构为单个方法

public string DoGeneralReplace(string input)
{
    var sb = new StringBuilder(input);
    sb.Replace("^", "?")
      .Replace("@", "Ž") ...;
}


//usage
foreach (var oldObj in oldDB)
{
    NewObject newObj = new NewObject();
    newObj.Name = DoGeneralReplace(oldObj.Name);
    ...
}
Run Code Online (Sandbox Code Playgroud)