将MatchCollection转换为字符串数组

Vil*_*Vil 70 c# regex arrays

有没有比这更好的方法将MatchCollection转换为字符串数组?

MatchCollection mc = Regex.Matches(strText, @"\b[A-Za-z-']+\b");
string[] strArray = new string[mc.Count];
for (int i = 0; i < mc.Count;i++ )
{
    strArray[i] = mc[i].Groups[0].Value;
}
Run Code Online (Sandbox Code Playgroud)

PS:mc.CopyTo(strArray,0)抛出异常:

源数组中至少有一个元素无法转换为目标数组类型.

Dav*_*ish 145

尝试:

var arr = Regex.Matches(strText, @"\b[A-Za-z-']+\b")
    .Cast<Match>()
    .Select(m => m.Value)
    .ToArray();
Run Code Online (Sandbox Code Playgroud)

  • @Alex你知道返回的所有东西都是`Match`,所以不需要在运行时再次检查它.`Cast`更有意义. (3认同)
  • @DaveBish我在下面发布了一些基准测试代码,`OfType <>`结果稍快一些. (2认同)

Ale*_*lex 28

Dave Bish的答案很好并且运作正常.

值得注意的是,尽管替换Cast<Match>()OfType<Match>()会加快速度.

代码成为:

var arr = Regex.Matches(strText, @"\b[A-Za-z-']+\b")
    .OfType<Match>()
    .Select(m => m.Groups[0].Value)
    .ToArray();
Run Code Online (Sandbox Code Playgroud)

结果完全相同(并以完全相同的方式解决OP的问题),但对于大字符串,它更快.

测试代码:

// put it in a console application
static void Test()
{
    Stopwatch sw = new Stopwatch();
    StringBuilder sb = new StringBuilder();
    string strText = "this will become a very long string after my code has done appending it to the stringbuilder ";

    Enumerable.Range(1, 100000).ToList().ForEach(i => sb.Append(strText));
    strText = sb.ToString();

    sw.Start();
    var arr = Regex.Matches(strText, @"\b[A-Za-z-']+\b")
              .OfType<Match>()
              .Select(m => m.Groups[0].Value)
              .ToArray();
    sw.Stop();

    Console.WriteLine("OfType: " + sw.ElapsedMilliseconds.ToString());
    sw.Reset();

    sw.Start();
    var arr2 = Regex.Matches(strText, @"\b[A-Za-z-']+\b")
              .Cast<Match>()
              .Select(m => m.Groups[0].Value)
              .ToArray();
    sw.Stop();
    Console.WriteLine("Cast: " + sw.ElapsedMilliseconds.ToString());
}
Run Code Online (Sandbox Code Playgroud)

输出如下:

OfType: 6540
Cast: 8743
Run Code Online (Sandbox Code Playgroud)

对于很长的字符串,Cast()因此较慢.

  • http://stackoverflow.com/questions/11430570/why-is-oftype-faster-than-cast (2认同)

Dav*_*Mar 6

我运行了Alex发布的完全相同的基准测试,发现有时Cast更快,有时OfType更快,但两者之间的差异可以忽略不计.然而,虽然丑陋,for循环始终比其他两个都快.

Stopwatch sw = new Stopwatch();
StringBuilder sb = new StringBuilder();
string strText = "this will become a very long string after my code has done appending it to the stringbuilder ";
Enumerable.Range(1, 100000).ToList().ForEach(i => sb.Append(strText));
strText = sb.ToString();

//First two benchmarks

sw.Start();
MatchCollection mc = Regex.Matches(strText, @"\b[A-Za-z-']+\b");
var matches = new string[mc.Count];
for (int i = 0; i < matches.Length; i++)
{
    matches[i] = mc[i].ToString();
}
sw.Stop();
Run Code Online (Sandbox Code Playgroud)

结果:

OfType: 3462
Cast: 3499
For: 2650
Run Code Online (Sandbox Code Playgroud)