C# 尽可能高效地从字符串中删除回车符、换行符和空格(基准)

Dan*_*iel 2 .net c# string removing-whitespace benchmarkdotnet

在 C# 中,我有一个包含空格、回车符和/或换行符的字符串。是否有一种简单的方法可以尽可能高效地规范化从文本文件导入的大字符串(100.000 到 1.000.000 个字符) ?

为了澄清我的意思:假设我的字符串看起来像 string1 但我希望它像 string2

string1 = " ab c\r\n de.\nf";
string2 = "abcde.f";
Run Code Online (Sandbox Code Playgroud)

Gur*_*ron 6

术语“有效”在很大程度上取决于您的实际字符串及其数量。我提出了下一个基准(针对BenchmarkDotNet):

public class Replace
{
    private static readonly string S = " ab c\r\n de.\nf";
    private static readonly Regex Reg = new Regex(@"\s+", RegexOptions.Compiled);

    [Benchmark]
    public string SimpleReplace() => S
       .Replace(" ","")
       .Replace("\\r","")
       .Replace("\\n","");

    [Benchmark]
    public string StringBuilder() => new StringBuilder().Append(S)
       .Replace(" ","")
       .Replace("\\r","")
       .Replace("\\n","")
       .ToString();

    [Benchmark]
    public string RegexReplace() => Reg.Replace(S, "");

    [Benchmark]
    public string NewString()
    {
            var arr = new char[S.Length];
            var cnt = 0;
            for (int i = 0; i < S.Length; i++)
            {
                switch(S[i])
                {
                    case ' ':
                    case '\r':
                    case '\n':
                        break;

                    default:
                        arr[cnt] = S[i];
                        cnt++;
                        break;
                }
            }

            return new string(arr, 0, cnt);
    }

    [Benchmark]
    public string NewStringForeach()
    {
        var validCharacters = new char[S.Length];
        var next = 0;

        foreach(var c in S)
        {
            switch(c)
            {
                case ' ':
                case '\r':
                case '\n':
                    // Ignore then
                    break;

                default:
                    validCharacters[next++] = c;
                    break;
            }
        }

        return new string(validCharacters, 0, next);
    }
} 
Run Code Online (Sandbox Code Playgroud)

这在我的机器上给出:

|          Method |        Mean |     Error |    StdDev |
|---------------- |------------:|----------:|----------:|
|   SimpleReplace |   122.09 ns |  1.273 ns |  1.063 ns |
|   StringBuilder |   311.28 ns |  6.313 ns |  8.850 ns |
|    RegexReplace | 1,194.91 ns | 23.376 ns | 34.265 ns |
|       NewString |    52.26 ns |  1.122 ns |  1.812 ns |
|NewStringForeach |    40.04 ns |  0.877 ns |  1.979 ns |
Run Code Online (Sandbox Code Playgroud)