Geo*_*ker 6 c# string refactoring
我在自己的时间里一直在修补小函数,试图找到重构它们的方法(我最近阅读了Martin Fowler的书" 重构:改进现有代码的设计").我MakeNiceString()
在更新它附近的代码库的另一部分时发现了以下函数,它看起来像是一个很好的候选人.事实上,没有真正的理由来替换它,但是它足够小并且做了一些小的事情,因此很容易遵循,但仍然可以获得"良好"的体验.
private static string MakeNiceString(string str)
{
char[] ca = str.ToCharArray();
string result = null;
int i = 0;
result += System.Convert.ToString(ca[0]);
for (i = 1; i <= ca.Length - 1; i++)
{
if (!(char.IsLower(ca[i])))
{
result += " ";
}
result += System.Convert.ToString(ca[i]);
}
return result;
}
static string SplitCamelCase(string str)
{
string[] temp = Regex.Split(str, @"(?<!^)(?=[A-Z])");
string result = String.Join(" ", temp);
return result;
}
Run Code Online (Sandbox Code Playgroud)
第一个函数MakeNiceString()
是我在工作中更新的一些代码中找到的函数.该函数的目的是将ThisIsAString转换为This Is A String.它在代码中的六个位置使用,并且在整个方案中非常微不足道.
我将第二个函数纯粹作为学术练习构建,以确定使用正则表达式是否需要更长时间.
好吧,结果如下:
有10次迭代:
MakeNiceString took 2649 ticks SplitCamelCase took 2502 ticks
然而,它在长途运输中发生了巨大的变化:
10,000次迭代:
MakeNiceString took 121625 ticks SplitCamelCase took 443001 ticks
MakeNiceString()
重构过程
MakeNiceString()
始于简单地删除正在发生的转换.这样做会产生以下结果:
MakeNiceString took 124716 ticks ImprovedMakeNiceString took 118486
这是Refactor#1之后的代码:
private static string ImprovedMakeNiceString(string str)
{ //Removed Convert.ToString()
char[] ca = str.ToCharArray();
string result = null;
int i = 0;
result += ca[0];
for (i = 1; i <= ca.Length - 1; i++)
{
if (!(char.IsLower(ca[i])))
{
result += " ";
}
result += ca[i];
}
return result;
}
Run Code Online (Sandbox Code Playgroud)
StringBuilder
我的第二个任务是使用
StringBuilder
而不是String
.由于String
是不可变的,因此在整个循环中创建了不必要的副本.使用它的基准如下,代码如下:
static string RefactoredMakeNiceString(string str)
{
char[] ca = str.ToCharArray();
StringBuilder sb = new StringBuilder((str.Length * 5 / 4));
int i = 0;
sb.Append(ca[0]);
for (i = 1; i <= ca.Length - 1; i++)
{
if (!(char.IsLower(ca[i])))
{
sb.Append(" ");
}
sb.Append(ca[i]);
}
return sb.ToString();
}
Run Code Online (Sandbox Code Playgroud)
这导致以下基准:
MakeNiceString Took: 124497 Ticks //Original SplitCamelCase Took: 464459 Ticks //Regex ImprovedMakeNiceString Took: 117369 Ticks //Remove Conversion RefactoredMakeNiceString Took: 38542 Ticks //Using StringBuilder
将for
循环更改为循环会foreach
导致以下基准测试结果:
static string RefactoredForEachMakeNiceString(string str)
{
char[] ca = str.ToCharArray();
StringBuilder sb1 = new StringBuilder((str.Length * 5 / 4));
sb1.Append(ca[0]);
foreach (char c in ca)
{
if (!(char.IsLower(c)))
{
sb1.Append(" ");
}
sb1.Append(c);
}
return sb1.ToString();
}
Run Code Online (Sandbox Code Playgroud)
RefactoredForEachMakeNiceString Took: 45163 Ticks
正如您所看到的那样,维护方面,foreach
循环将是最容易维护并具有"最干净"的外观.它比for
循环稍慢,但更容易遵循.
Regex
在循环开始之前我将正则表达式移到了正确的位置,希望因为它只编译一次,它会执行得更快.我发现的(我确定我在某处有一个错误)是不会发生的,就像它应该:
static void runTest5()
{
Regex rg = new Regex(@"(?<!^)(?=[A-Z])", RegexOptions.Compiled);
for (int i = 0; i < 10000; i++)
{
CompiledRegex(rg, myString);
}
}
static string CompiledRegex(Regex regex, string str)
{
string result = null;
Regex rg1 = regex;
string[] temp = rg1.Split(str);
result = String.Join(" ", temp);
return result;
}
Run Code Online (Sandbox Code Playgroud)
MakeNiceString Took 139363 Ticks SplitCamelCase Took 489174 Ticks ImprovedMakeNiceString Took 115478 Ticks RefactoredMakeNiceString Took 38819 Ticks RefactoredForEachMakeNiceString Took 44700 Ticks CompiledRegex Took 227021 Ticks
或者,如果您更喜欢毫秒:
MakeNiceString Took 38 ms SplitCamelCase Took 123 ms ImprovedMakeNiceString Took 33 ms RefactoredMakeNiceString Took 11 ms RefactoredForEachMakeNiceString Took 12 ms CompiledRegex Took 63 ms
所以百分比收益是:
MakeNiceString 38 ms Baseline SplitCamelCase 123 ms 223% slower ImprovedMakeNiceString 33 ms 13.15% faster RefactoredMakeNiceString 11 ms 71.05% faster RefactoredForEachMakeNiceString 12 ms 68.42% faster CompiledRegex 63 ms 65.79% slower
(请检查我的数学)
最后,我将替换那里的东西RefactoredForEachMakeNiceString()
,当我在它的时候,我将把它重命名为有用的东西,比如SplitStringOnUpperCase
.
要进行基准测试,我只需Stopwatch
为每个方法调用调用一个新的:
string myString = "ThisIsAUpperCaseString";
Stopwatch sw = new Stopwatch();
sw.Start();
runTest();
sw.Stop();
static void runTest()
{
for (int i = 0; i < 10000; i++)
{
MakeNiceString(myString);
}
}
Run Code Online (Sandbox Code Playgroud)
感谢您迄今为止的回复.我已经插入了@Jon Skeet提出的所有建议,并希望得到关于我所提出的更新问题的反馈意见.
注意:这个问题旨在探索在C#中重构字符串处理函数的方法.我复制/粘贴了第一个代码
as is
.我很清楚你可以删除System.Convert.ToString()
第一种方法,我就是这么做的.如果有人知道删除的任何影响System.Convert.ToString()
,那么知道也会有所帮助.
Jon*_*eet 17
1)使用StringBuilder,最好设置合理的初始容量(例如字符串长度*5/4,每四个字符允许一个额外的空格).
2)尝试使用foreach循环而不是for循环 - 它可能更简单
3)您不需要首先将字符串转换为char数组 - foreach将在字符串上工作,或使用索引器.
4)不要在任何地方进行额外的字符串转换 - 调用Convert.ToString(char)然后附加该字符串是没有意义的; 不需要单个字符串
5)对于第二个选项,只需在方法之外构建一次正则表达式.尝试使用RegexOptions.Compiled.
编辑:好的,完整的基准测试结果.我已经尝试了一些其他的东西,并且还使用相当多的迭代执行代码以获得更准确的结果.这只能在Eee PC上运行,所以毫无疑问它会在"真正的"PC上运行得更快,但我怀疑广泛的结果是合适的.首先是代码:
using System;
using System.Collections;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Reflection;
using System.Text;
using System.Text.RegularExpressions;
class Benchmark
{
const string TestData = "ThisIsAUpperCaseString";
const string ValidResult = "This Is A Upper Case String";
const int Iterations = 1000000;
static void Main(string[] args)
{
Test(BenchmarkOverhead);
Test(MakeNiceString);
Test(ImprovedMakeNiceString);
Test(RefactoredMakeNiceString);
Test(MakeNiceStringWithStringIndexer);
Test(MakeNiceStringWithForeach);
Test(MakeNiceStringWithForeachAndLinqSkip);
Test(MakeNiceStringWithForeachAndCustomSkip);
Test(SplitCamelCase);
Test(SplitCamelCaseCachedRegex);
Test(SplitCamelCaseCompiledRegex);
}
static void Test(Func<string,string> function)
{
Console.Write("{0}... ", function.Method.Name);
Stopwatch sw = Stopwatch.StartNew();
for (int i=0; i < Iterations; i++)
{
string result = function(TestData);
if (result.Length != ValidResult.Length)
{
throw new Exception("Bad result: " + result);
}
}
sw.Stop();
Console.WriteLine(" {0}ms", sw.ElapsedMilliseconds);
GC.Collect();
}
private static string BenchmarkOverhead(string str)
{
return ValidResult;
}
private static string MakeNiceString(string str)
{
char[] ca = str.ToCharArray();
string result = null;
int i = 0;
result += System.Convert.ToString(ca[0]);
for (i = 1; i <= ca.Length - 1; i++)
{
if (!(char.IsLower(ca[i])))
{
result += " ";
}
result += System.Convert.ToString(ca[i]);
}
return result;
}
private static string ImprovedMakeNiceString(string str)
{ //Removed Convert.ToString()
char[] ca = str.ToCharArray();
string result = null;
int i = 0;
result += ca[0];
for (i = 1; i <= ca.Length - 1; i++)
{
if (!(char.IsLower(ca[i])))
{
result += " ";
}
result += ca[i];
}
return result;
}
private static string RefactoredMakeNiceString(string str)
{
char[] ca = str.ToCharArray();
StringBuilder sb = new StringBuilder((str.Length * 5 / 4));
int i = 0;
sb.Append(ca[0]);
for (i = 1; i <= ca.Length - 1; i++)
{
if (!(char.IsLower(ca[i])))
{
sb.Append(" ");
}
sb.Append(ca[i]);
}
return sb.ToString();
}
private static string MakeNiceStringWithStringIndexer(string str)
{
StringBuilder sb = new StringBuilder((str.Length * 5 / 4));
sb.Append(str[0]);
for (int i = 1; i < str.Length; i++)
{
char c = str[i];
if (!(char.IsLower(c)))
{
sb.Append(" ");
}
sb.Append(c);
}
return sb.ToString();
}
private static string MakeNiceStringWithForeach(string str)
{
StringBuilder sb = new StringBuilder(str.Length * 5 / 4);
bool first = true;
foreach (char c in str)
{
if (!first && char.IsUpper(c))
{
sb.Append(" ");
}
sb.Append(c);
first = false;
}
return sb.ToString();
}
private static string MakeNiceStringWithForeachAndLinqSkip(string str)
{
StringBuilder sb = new StringBuilder(str.Length * 5 / 4);
sb.Append(str[0]);
foreach (char c in str.Skip(1))
{
if (char.IsUpper(c))
{
sb.Append(" ");
}
sb.Append(c);
}
return sb.ToString();
}
private static string MakeNiceStringWithForeachAndCustomSkip(string str)
{
StringBuilder sb = new StringBuilder(str.Length * 5 / 4);
sb.Append(str[0]);
foreach (char c in new SkipEnumerable<char>(str, 1))
{
if (char.IsUpper(c))
{
sb.Append(" ");
}
sb.Append(c);
}
return sb.ToString();
}
private static string SplitCamelCase(string str)
{
string[] temp = Regex.Split(str, @"(?<!^)(?=[A-Z])");
string result = String.Join(" ", temp);
return result;
}
private static readonly Regex CachedRegex = new Regex("(?<!^)(?=[A-Z])");
private static string SplitCamelCaseCachedRegex(string str)
{
string[] temp = CachedRegex.Split(str);
string result = String.Join(" ", temp);
return result;
}
private static readonly Regex CompiledRegex =
new Regex("(?<!^)(?=[A-Z])", RegexOptions.Compiled);
private static string SplitCamelCaseCompiledRegex(string str)
{
string[] temp = CompiledRegex.Split(str);
string result = String.Join(" ", temp);
return result;
}
private class SkipEnumerable<T> : IEnumerable<T>
{
private readonly IEnumerable<T> original;
private readonly int skip;
public SkipEnumerable(IEnumerable<T> original, int skip)
{
this.original = original;
this.skip = skip;
}
public IEnumerator<T> GetEnumerator()
{
IEnumerator<T> ret = original.GetEnumerator();
for (int i=0; i < skip; i++)
{
ret.MoveNext();
}
return ret;
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
}
Run Code Online (Sandbox Code Playgroud)
结果如下:
BenchmarkOverhead... 22ms
MakeNiceString... 10062ms
ImprovedMakeNiceString... 12367ms
RefactoredMakeNiceString... 3489ms
MakeNiceStringWithStringIndexer... 3115ms
MakeNiceStringWithForeach... 3292ms
MakeNiceStringWithForeachAndLinqSkip... 5702ms
MakeNiceStringWithForeachAndCustomSkip... 4490ms
SplitCamelCase... 68267ms
SplitCamelCaseCachedRegex... 52529ms
SplitCamelCaseCompiledRegex... 26806ms
Run Code Online (Sandbox Code Playgroud)
正如您所看到的,字符串索引器版本是赢家 - 它也是非常简单的代码.
希望这会有所帮助......不要忘记,肯定会有其他我没有想过的选择!