spe*_*der 7 c# regex string optimization
经过大量测量后,我发现了一个我想要优化的Windows服务中的热点.我们正在处理可能有多个连续空格的字符串,我们希望减少到只有一个空格.我们使用静态编译的正则表达式来执行此任务:
private static readonly Regex
regex_select_all_multiple_whitespace_chars =
new Regex(@"\s+",RegexOptions.Compiled);
Run Code Online (Sandbox Code Playgroud)
然后按如下方式使用它:
var cleanString=
regex_select_all_multiple_whitespace_chars.Replace(dirtyString.Trim(), " ");
Run Code Online (Sandbox Code Playgroud)
这条线被调用了数百万次,并且被证明是相当密集的.我试着写一些更好的东西,但我很难过.鉴于正则表达式的处理要求相当适中,肯定会有更快的速度.可以unsafe用指针速度的东西进一步处理?
编辑:
感谢对这个问题的惊人反应......最让人意想不到的!
这大约快三倍:
private static string RemoveDuplicateSpaces(string text) {
StringBuilder b = new StringBuilder(text.Length);
bool space = false;
foreach (char c in text) {
if (c == ' ') {
if (!space) b.Append(c);
space = true;
} else {
b.Append(c);
space = false;
}
}
return b.ToString();
}
Run Code Online (Sandbox Code Playgroud)
这个怎么样...
public string RemoveMultiSpace(string test)
{
var words = test.Split(new char[] { ' ' },
StringSplitOptions.RemoveEmptyEntries);
return string.Join(" ", words);
}
Run Code Online (Sandbox Code Playgroud)
使用NUnit运行测试用例:
测试时间以毫秒为单位.
Regex Test time: 338,8885
RemoveMultiSpace Test time: 78,9335
Run Code Online (Sandbox Code Playgroud)
private static readonly Regex regex_select_all_multiple_whitespace_chars =
new Regex(@"\s+", RegexOptions.Compiled);
[Test]
public void Test()
{
string startString = "A B C D E F A B C D E F A B C D E F A B C D E F A B C D E F A B C D E F A B C D E F A B C D E F A B C D E F A B C D E F ";
string cleanString;
Trace.WriteLine("Regex Test start");
int count = 10000;
Stopwatch timer = new Stopwatch();
timer.Start();
for (int i = 0; i < count; i++)
{
cleanString = regex_select_all_multiple_whitespace_chars.Replace(startString, " ");
}
var elapsed = timer.Elapsed;
Trace.WriteLine("Regex Test end");
Trace.WriteLine("Regex Test time: " + elapsed.TotalMilliseconds);
Trace.WriteLine("RemoveMultiSpace Test start");
timer = new Stopwatch();
timer.Start();
for (int i = 0; i < count; i++)
{
cleanString = RemoveMultiSpace(startString);
}
elapsed = timer.Elapsed;
Trace.WriteLine("RemoveMultiSpace Test end");
Trace.WriteLine("RemoveMultiSpace Test time: " + elapsed.TotalMilliseconds);
}
public string RemoveMultiSpace(string test)
{
var words = test.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
return string.Join(" ", words);
}
Run Code Online (Sandbox Code Playgroud)
编辑:进行了
一些测试,并添加了基于StringBuilder的Guffa方法"RemoveDuplicateSpaces".
所以我的结论是,当存在大量空格时,StringBuilder方法更快,但是空格更少,字符串拆分方法稍快一些.
Cleaning file with about 30000 lines, 10 iterations
RegEx time elapsed: 608,0623
RemoveMultiSpace time elapsed: 239,2049
RemoveDuplicateSpaces time elapsed: 307,2044
Cleaning string, 10000 iterations:
A B C D E F A B C D E F A B C D E F A B C D E F A B C D E F A B C D E F A B C D E F A B C D E F A B C D E F A B C D E F A B C D E F A B C D E F A B C D E F A B C D E F A B C D E F A B C D E F A B C D E F A B C D E F A B C D E F A B C D E F
RegEx time elapsed: 590,3626
RemoveMultiSpace time elapsed: 159,4547
RemoveDuplicateSpaces time elapsed: 137,6816
Cleaning string, 10000 iterations:
A B C D E F A B C D E F A B C D E F A B C D E F A B C D E F A B C D E F A B C D E F A B C D E F
RegEx time elapsed: 290,5666
RemoveMultiSpace time elapsed: 64,6776
RemoveDuplicateSpaces time elapsed: 52,4732
Run Code Online (Sandbox Code Playgroud)