dr.*_*vil 23 .net regex performance
我遇到这篇文章:
性能:编译与解释的正则表达式,我修改了示例代码以编译1000 Regex,然后每次运行500次以利用预编译,但即使在这种情况下解释的RegExes运行速度快4倍!
这意味着最大的区别是由于JIT,在解决JIT编译的正则表达式后,下面的代码仍然执行有点慢,对我来说没有意义,但@Jim在答案中提供了一个更清晰的版本,它按预期工作.RegexOptions.Compiled选项完全没用,实际上更糟糕的是,它更慢!
任何人都可以解释为什么会这样吗?
从博客文章中获取和修改的代码:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
namespace RegExTester
{
class Program
{
static void Main(string[] args)
{
DateTime startTime = DateTime.Now;
for (int i = 0; i < 1000; i++)
{
CheckForMatches("some random text with email address, address@domain200.com" + i.ToString());
}
double msTaken = DateTime.Now.Subtract(startTime).TotalMilliseconds;
Console.WriteLine("Full Run: " + msTaken);
startTime = DateTime.Now;
for (int i = 0; i < 1000; i++)
{
CheckForMatches("some random text with email address, address@domain200.com" + i.ToString());
}
msTaken = DateTime.Now.Subtract(startTime).TotalMilliseconds;
Console.WriteLine("Full Run: " + msTaken);
Console.ReadLine();
}
private static List<Regex> _expressions;
private static object _SyncRoot = new object();
private static List<Regex> GetExpressions()
{
if (_expressions != null)
return _expressions;
lock (_SyncRoot)
{
if (_expressions == null)
{
DateTime startTime = DateTime.Now;
List<Regex> tempExpressions = new List<Regex>();
string regExPattern =
@"^[a-zA-Z0-9]+[a-zA-Z0-9._%-]*@{0}$";
for (int i = 0; i < 2000; i++)
{
tempExpressions.Add(new Regex(
string.Format(regExPattern,
Regex.Escape("domain" + i.ToString() + "." +
(i % 3 == 0 ? ".com" : ".net"))),
RegexOptions.IgnoreCase));// | RegexOptions.Compiled
}
_expressions = new List<Regex>(tempExpressions);
DateTime endTime = DateTime.Now;
double msTaken = endTime.Subtract(startTime).TotalMilliseconds;
Console.WriteLine("Init:" + msTaken);
}
}
return _expressions;
}
static List<Regex> expressions = GetExpressions();
private static void CheckForMatches(string text)
{
DateTime startTime = DateTime.Now;
foreach (Regex e in expressions)
{
bool isMatch = e.IsMatch(text);
}
DateTime endTime = DateTime.Now;
//double msTaken = endTime.Subtract(startTime).TotalMilliseconds;
//Console.WriteLine("Run: " + msTaken);
}
}
}
Run Code Online (Sandbox Code Playgroud)
Jim*_*hel 38
编译后的正则表达式在按预期使用时匹配得更快.正如其他人所指出的那样,我们的想法是将它们编译一次并多次使用它们.构造和初始化时间在这些运行中摊销.
我创建了一个更简单的测试,它将向您展示编译的正则表达式无疑比未编译的更快.
const int NumIterations = 1000;
const string TestString = "some random text with email address, address@domain200.com";
const string Pattern = "^[a-zA-Z0-9]+[a-zA-Z0-9._%-]*@domain0\\.\\.com$";
private static Regex NormalRegex = new Regex(Pattern, RegexOptions.IgnoreCase);
private static Regex CompiledRegex = new Regex(Pattern, RegexOptions.IgnoreCase | RegexOptions.Compiled);
private static Regex DummyRegex = new Regex("^.$");
static void Main(string[] args)
{
var DoTest = new Action<string, Regex, int>((s, r, count) =>
{
Console.Write("Testing {0} ... ", s);
Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i < count; ++i)
{
bool isMatch = r.IsMatch(TestString + i.ToString());
}
sw.Stop();
Console.WriteLine("{0:N0} ms", sw.ElapsedMilliseconds);
});
// Make sure that DoTest is JITed
DoTest("Dummy", DummyRegex, 1);
DoTest("Normal first time", NormalRegex, 1);
DoTest("Normal Regex", NormalRegex, NumIterations);
DoTest("Compiled first time", CompiledRegex, 1);
DoTest("Compiled", CompiledRegex, NumIterations);
Console.WriteLine();
Console.Write("Done. Press Enter:");
Console.ReadLine();
}
Run Code Online (Sandbox Code Playgroud)
设置NumIterations为500给我这个:
Testing Dummy ... 0 ms
Testing Normal first time ... 0 ms
Testing Normal Regex ... 1 ms
Testing Compiled first time ... 13 ms
Testing Compiled ... 1 ms
Run Code Online (Sandbox Code Playgroud)
通过500万次迭代,我得到:
Testing Dummy ... 0 ms
Testing Normal first time ... 0 ms
Testing Normal Regex ... 17,232 ms
Testing Compiled first time ... 17 ms
Testing Compiled ... 15,299 ms
Run Code Online (Sandbox Code Playgroud)
在这里,您可以看到编译的正则表达式比未编译的版本快至少10%.
有趣的是,如果RegexOptions.IgnoreCase从正则表达式中删除,则500万次迭代的结果更加惊人:
Testing Dummy ... 0 ms
Testing Normal first time ... 0 ms
Testing Normal Regex ... 12,869 ms
Testing Compiled first time ... 14 ms
Testing Compiled ... 8,332 ms
Run Code Online (Sandbox Code Playgroud)
这里,编译的正则表达式比未编译的正则表达式快35%.
在我看来,你引用的博客文章只是一个有缺陷的测试.
http://www.codinghorror.com/blog/2005/03/to-compile-or-not-to-compile.html
只有在实例化一次并重复使用多次时,编译才有帮助.如果你在for循环中创建一个已编译的正则表达式,那么它显然会表现得更糟.你能告诉我们你的示例代码吗?
这个基准测试的问题是编译的Regexes有创建一个全新程序集并将其加载到AppDomain的开销.
编译Regex的设计方案(我相信 - 我没有设计它们)有数百个Regex执行数百万次,而不是数千个Regex执行数千次.如果你不打算在一百万次的领域执行正则表达式,你可能甚至不会弥补JIT编译它的时间.