为什么regex.IsMatch(str)比str.EndsWith(不变文化)更快?

Eug*_*sky 10 .net regex string performance

这是针对代码路径的一些微基准测试,每个纳秒需要遍历数十亿次,并且需要快速.

对于下面的代码段,进行比较

  • x.EndsWith(y, InvariantCulture)
  • Regex(y, Compiled | CultureInvariant).IsMatch(x)

我得到以下数字:

=============================
Regex   : 00:00:01.2235890. Ignore this: 16666666
EndsWith: 00:00:03.2194626. Ignore this: 16666666
=============================
Regex   : 00:00:01.0979105. Ignore this: 16666666
EndsWith: 00:00:03.2346031. Ignore this: 16666666
=============================
Regex   : 00:00:01.0687845. Ignore this: 16666666
EndsWith: 00:00:03.3199213. Ignore this: 16666666
Run Code Online (Sandbox Code Playgroud)

换句话说,EndsWith需要3倍的时间Regex.

我应该注意,我尝试了其他值,并根据使用的字符串值,有时EndsWith更快,有时Regex.

EndsWith(x, InvariantCulture)归结为一些参数检查然后extern int nativeCompareOrdinalEx(String, int, String, int, int),我希望它会很快.(正如@nhahtdh正确指出的那样,在InvariantCulture它调用的情况下CultureInfo.InvariantCulture.CompareInfo.IsSuffix which calls InternalFindNLSStringEx.我不小心跟踪了这Ordinal条路径)

注意:我刚刚发现,当用EndsWith Ordinal代替InvariantCultureEndsWith时,EndsWith比Regex快得多......不幸的是没有RegexOptions.Ordinal比较它.

我还期望编译的正则表达式很快,但它怎么能超过专门的方法呢?

Le代码:

string[] BunchOfIDs =
{
    "zxc@x@432143214@O@abcße",
    "zxc@x@432143214@T@abcßX",
    "qwe@x@432143214@O@abcße",
    "qwe@x@432143214@XXabc",
    "zxc@x@1234@O@aXcße",
    "qwe@y@1234@O@aYcße",
};

var endsWith = "@abcße";
var endsWithRegex = new Regex("@abcße$", RegexOptions.None);

int reps = 20000000;
for (int i = 0; i < 3; i++)
{
    Console.WriteLine("=============================");
    int x = 0;
    var sw = Stopwatch.StartNew();
    for (int j = 0; j < reps; j++)
    {
        x += BunchOfIDs[j % BunchOfIDs.Length].EndsWith(endsWith, StringComparison.InvariantCulture) ? 1 : 2;
    }
    Console.WriteLine("EndsWith: " + sw.Elapsed + ". Ignore this: " + x);

    x = 0;
    sw = Stopwatch.StartNew();
    for (int j = 0; j < reps; j++)
    {
        x += endsWithRegex.IsMatch(BunchOfIDs[j % BunchOfIDs.Length]) ? 1 : 2;
    }
    Console.WriteLine("Regex   : " + sw.Elapsed + ". Ignore this: " + x);
}
Run Code Online (Sandbox Code Playgroud)

Eug*_*sky 5

有可能

因为StringComparison.InvariantCulture != RegexOptions.CultureInvariant!

这个片段

var str = "ss";
var endsWith = "ß";
var endsWithRegex = new Regex("ß$",
    RegexOptions.Compiled | RegexOptions.CultureInvariant);
Console.WriteLine(str.EndsWith(endsWith, StringComparison.InvariantCulture)
    + " vs "
    + endsWithRegex.IsMatch(str));
Run Code Online (Sandbox Code Playgroud)

版画

True vs False
Run Code Online (Sandbox Code Playgroud)

所以它看起来像RegexOptions.CultureInvariant并不意味着StringComparison.InvariantCulture隐含的东西.RegexOptions.CultureInvariant可能更像是StringComparison.Ordinal吗?