Dej*_*vić 26 c# regex string performance replace
我正在导入一些具有多个string字段的记录,从旧数据库到新数据库.它看起来很慢,我怀疑是因为我这样做:
foreach (var oldObj in oldDB)
{
NewObject newObj = new NewObject();
newObj.Name = oldObj.Name.Trim().Replace('^', '?').Replace('@', 'Ž').Replace('[', 'Š')
.Replace(']', '?').Replace('`', 'ž').Replace('}', '?')
.Replace('~', '?').Replace('{', 'š').Replace('\\', '?');
newObj.Surname = oldObj.Surname.Trim().Replace('^', '?').Replace('@', 'Ž').Replace('[', 'Š')
.Replace(']', '?').Replace('`', 'ž').Replace('}', '?')
.Replace('~', '?').Replace('{', 'š').Replace('\\', '?');
newObj.Address = oldObj.Address.Trim().Replace('^', '?').Replace('@', 'Ž').Replace('[', 'Š')
.Replace(']', '?').Replace('`', 'ž').Replace('}', '?')
.Replace('~', '?').Replace('{', 'š').Replace('\\', '?');
newObj.Note = oldObj.Note.Trim().Replace('^', '?').Replace('@', 'Ž').Replace('[', 'Š')
.Replace(']', '?').Replace('`', 'ž').Replace('}', '?')
.Replace('~', '?').Replace('{', 'š').Replace('\\', '?');
/*
... some processing ...
*/
}
Run Code Online (Sandbox Code Playgroud)
现在,我已经通过网络阅读了一些帖子和文章,我已经看到了很多不同的想法.有人说如果我做正则表达式会更好MatchEvaluator,有人说最好保留原样.
虽然我可能更容易为自己做一个基准案例,但我决定在这里提出一个问题,以防其他人一直想知道同一个问题,或者有人提前知道.
那么在C#中执行此操作的最快方法是什么?
编辑
我在这里发布了基准.乍一看,理查德的方式可能是最快的.然而,他的方式,也不是马克的,因为错误的正则表达式模式会做任何事情.修正后的图案
@"\^@\[\]`\}~\{\\"
Run Code Online (Sandbox Code Playgroud)
至
@"\^|@|\[|\]|`|\}|~|\{|\\"
Run Code Online (Sandbox Code Playgroud)
似乎用链式.Replace()调用的旧方法毕竟是最快的
Dej*_*vić 27
谢谢你的投入.我写了一个快速而肮脏的基准来测试你的输入.我已经测试了解析4个字符串500.000次迭代并完成了4次传递.结果如下:
*** Pass 1 Old (Chained String.Replace()) way completed in 814 ms logicnp (ToCharArray) way completed in 916 ms oleksii (StringBuilder) way completed in 943 ms André Christoffer Andersen (Lambda w/ Aggregate) way completed in 2551 ms Richard (Regex w/ MatchEvaluator) way completed in 215 ms Marc Gravell (Static Regex) way completed in 1008 ms *** Pass 2 Old (Chained String.Replace()) way completed in 786 ms logicnp (ToCharArray) way completed in 920 ms oleksii (StringBuilder) way completed in 905 ms André Christoffer Andersen (Lambda w/ Aggregate) way completed in 2515 ms Richard (Regex w/ MatchEvaluator) way completed in 217 ms Marc Gravell (Static Regex) way completed in 1025 ms *** Pass 3 Old (Chained String.Replace()) way completed in 775 ms logicnp (ToCharArray) way completed in 903 ms oleksii (StringBuilder) way completed in 931 ms André Christoffer Andersen (Lambda w/ Aggregate) way completed in 2529 ms Richard (Regex w/ MatchEvaluator) way completed in 214 ms Marc Gravell (Static Regex) way completed in 1022 ms *** Pass 4 Old (Chained String.Replace()) way completed in 799 ms logicnp (ToCharArray) way completed in 908 ms oleksii (StringBuilder) way completed in 938 ms André Christoffer Andersen (Lambda w/ Aggregate) way completed in 2592 ms Richard (Regex w/ MatchEvaluator) way completed in 225 ms Marc Gravell (Static Regex) way completed in 1050 ms
该基准的代码如下.请查看代码并确认@Richard有最快的方法.请注意,我没有检查输出是否正确,我认为它们是.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Diagnostics;
using System.Text.RegularExpressions;
namespace StringReplaceTest
{
class Program
{
static string test1 = "A^@[BCD";
static string test2 = "E]FGH\\";
static string test3 = "ijk`l}m";
static string test4 = "nopq~{r";
static readonly Dictionary<char, string> repl =
new Dictionary<char, string>
{
{'^', "?"}, {'@', "Ž"}, {'[', "Š"}, {']', "?"}, {'`', "ž"}, {'}', "?"}, {'~', "?"}, {'{', "š"}, {'\\', "?"}
};
static readonly Regex replaceRegex;
static Program() // static initializer
{
StringBuilder pattern = new StringBuilder().Append('[');
foreach (var key in repl.Keys)
pattern.Append(Regex.Escape(key.ToString()));
pattern.Append(']');
replaceRegex = new Regex(pattern.ToString(), RegexOptions.Compiled);
}
public static string Sanitize(string input)
{
return replaceRegex.Replace(input, match =>
{
return repl[match.Value[0]];
});
}
static string DoGeneralReplace(string input)
{
var sb = new StringBuilder(input);
return sb.Replace('^', '?').Replace('@', 'Ž').Replace('[', 'Š').Replace(']', '?').Replace('`', 'ž').Replace('}', '?').Replace('~', '?').Replace('{', 'š').Replace('\\', '?').ToString();
}
//Method for replacing chars with a mapping
static string Replace(string input, IDictionary<char, char> replacementMap)
{
return replacementMap.Keys
.Aggregate(input, (current, oldChar)
=> current.Replace(oldChar, replacementMap[oldChar]));
}
static void Main(string[] args)
{
for (int i = 1; i < 5; i++)
DoIt(i);
}
static void DoIt(int n)
{
Stopwatch sw = new Stopwatch();
int idx = 0;
Console.WriteLine("*** Pass " + n.ToString());
// old way
sw.Start();
for (idx = 0; idx < 500000; idx++)
{
string result1 = test1.Replace('^', '?').Replace('@', 'Ž').Replace('[', 'Š').Replace(']', '?').Replace('`', 'ž').Replace('}', '?').Replace('~', '?').Replace('{', 'š').Replace('\\', '?');
string result2 = test2.Replace('^', '?').Replace('@', 'Ž').Replace('[', 'Š').Replace(']', '?').Replace('`', 'ž').Replace('}', '?').Replace('~', '?').Replace('{', 'š').Replace('\\', '?');
string result3 = test3.Replace('^', '?').Replace('@', 'Ž').Replace('[', 'Š').Replace(']', '?').Replace('`', 'ž').Replace('}', '?').Replace('~', '?').Replace('{', 'š').Replace('\\', '?');
string result4 = test4.Replace('^', '?').Replace('@', 'Ž').Replace('[', 'Š').Replace(']', '?').Replace('`', 'ž').Replace('}', '?').Replace('~', '?').Replace('{', 'š').Replace('\\', '?');
}
sw.Stop();
Console.WriteLine("Old (Chained String.Replace()) way completed in " + sw.ElapsedMilliseconds.ToString() + " ms");
Dictionary<char, char> replacements = new Dictionary<char, char>();
replacements.Add('^', '?');
replacements.Add('@', 'Ž');
replacements.Add('[', 'Š');
replacements.Add(']', '?');
replacements.Add('`', 'ž');
replacements.Add('}', '?');
replacements.Add('~', '?');
replacements.Add('{', 'š');
replacements.Add('\\', '?');
// logicnp way
sw.Reset();
sw.Start();
for (idx = 0; idx < 500000; idx++)
{
char[] charArray1 = test1.ToCharArray();
for (int i = 0; i < charArray1.Length; i++)
{
char newChar;
if (replacements.TryGetValue(test1[i], out newChar))
charArray1[i] = newChar;
}
string result1 = new string(charArray1);
char[] charArray2 = test2.ToCharArray();
for (int i = 0; i < charArray2.Length; i++)
{
char newChar;
if (replacements.TryGetValue(test2[i], out newChar))
charArray2[i] = newChar;
}
string result2 = new string(charArray2);
char[] charArray3 = test3.ToCharArray();
for (int i = 0; i < charArray3.Length; i++)
{
char newChar;
if (replacements.TryGetValue(test3[i], out newChar))
charArray3[i] = newChar;
}
string result3 = new string(charArray3);
char[] charArray4 = test4.ToCharArray();
for (int i = 0; i < charArray4.Length; i++)
{
char newChar;
if (replacements.TryGetValue(test4[i], out newChar))
charArray4[i] = newChar;
}
string result4 = new string(charArray4);
}
sw.Stop();
Console.WriteLine("logicnp (ToCharArray) way completed in " + sw.ElapsedMilliseconds.ToString() + " ms");
// oleksii way
sw.Reset();
sw.Start();
for (idx = 0; idx < 500000; idx++)
{
string result1 = DoGeneralReplace(test1);
string result2 = DoGeneralReplace(test2);
string result3 = DoGeneralReplace(test3);
string result4 = DoGeneralReplace(test4);
}
sw.Stop();
Console.WriteLine("oleksii (StringBuilder) way completed in " + sw.ElapsedMilliseconds.ToString() + " ms");
// André Christoffer Andersen way
sw.Reset();
sw.Start();
for (idx = 0; idx < 500000; idx++)
{
string result1 = Replace(test1, replacements);
string result2 = Replace(test2, replacements);
string result3 = Replace(test3, replacements);
string result4 = Replace(test4, replacements);
}
sw.Stop();
Console.WriteLine("André Christoffer Andersen (Lambda w/ Aggregate) way completed in " + sw.ElapsedMilliseconds.ToString() + " ms");
// Richard way
sw.Reset();
sw.Start();
Regex reg = new Regex(@"\^|@|\[|\]|`|\}|~|\{|\\");
MatchEvaluator eval = match =>
{
switch (match.Value)
{
case "^": return "?";
case "@": return "Ž";
case "[": return "Š";
case "]": return "?";
case "`": return "ž";
case "}": return "?";
case "~": return "?";
case "{": return "š";
case "\\": return "?";
default: throw new Exception("Unexpected match!");
}
};
for (idx = 0; idx < 500000; idx++)
{
string result1 = reg.Replace(test1, eval);
string result2 = reg.Replace(test2, eval);
string result3 = reg.Replace(test3, eval);
string result4 = reg.Replace(test4, eval);
}
sw.Stop();
Console.WriteLine("Richard (Regex w/ MatchEvaluator) way completed in " + sw.ElapsedMilliseconds.ToString() + " ms");
// Marc Gravell way
sw.Reset();
sw.Start();
for (idx = 0; idx < 500000; idx++)
{
string result1 = Sanitize(test1);
string result2 = Sanitize(test2);
string result3 = Sanitize(test3);
string result4 = Sanitize(test4);
}
sw.Stop();
Console.WriteLine("Marc Gravell (Static Regex) way completed in " + sw.ElapsedMilliseconds.ToString() + " ms\n");
}
}
}
Run Code Online (Sandbox Code Playgroud)
Ric*_*ard 15
最快的方式
唯一的方法是自己比较性能.尝试在Q中,使用StringBuilder和也Regex.Replace.
但微基准测试不考虑整个系统的范围.如果这种方法只是整个系统的一小部分,那么它的性能可能与整个应用程序的性能无关.
一些说明:
String上面(我假设)将创建许多中间字符串:更多的GC工作.但这很简单.StringBuilder允许每次替换修改相同的基础数据.这会减少垃圾.它几乎和使用一样简单String.regex是最复杂的(因为您需要使用代码来计算替换),但允许使用单个表达式.我希望这个更慢,除非替换列表非常大并且输入字符串中的替换很少(即大多数替换方法调用都不会替换任何东西,只需要通过字符串搜索成本).由于较少的GC负载,我预计#2会因重复使用(数千次)而稍快一些.
对于正则表达式方法,您需要以下内容:
newObj.Name = Regex.Replace(oldObj.Name.Trim(), @"[@^\[\]`}~{\\]", match => {
switch (match.Value) {
case "^": return "?";
case "@": return "Ž";
case "[": return "Š";
case "]": return "?";
case "`": return "ž";
case "}": return "?";
case "~": return "?";
case "{": return "š";
case "\\": return "?";
default: throw new Exception("Unexpected match!");
}
});
Run Code Online (Sandbox Code Playgroud)
这可以通过参数化a Dictionary<char,char>以保持替换和可重用的方式以可重用的方式完成MatchEvaluator.
试试这个:
Dictionary<char, char> replacements = new Dictionary<char, char>();
// populate replacements
string str = "mystring";
char []charArray = str.ToCharArray();
for (int i = 0; i < charArray.Length; i++)
{
char newChar;
if (replacements.TryGetValue(str[i], out newChar))
charArray[i] = newChar;
}
string newStr = new string(charArray);
Run Code Online (Sandbox Code Playgroud)
一种可能的解决方案是使用一个StringBuilder类.
您可以先将代码重构为单个方法
public string DoGeneralReplace(string input)
{
var sb = new StringBuilder(input);
sb.Replace("^", "?")
.Replace("@", "Ž") ...;
}
//usage
foreach (var oldObj in oldDB)
{
NewObject newObj = new NewObject();
newObj.Name = DoGeneralReplace(oldObj.Name);
...
}
Run Code Online (Sandbox Code Playgroud)