Find()与FirstOrDefault()的性能

Question

Find()与FirstOrDefault()的性能

类似的问题:
Find()vs. Where().FirstOrDefault()

在具有单个字符串属性的简单引用类型的大序列中搜索Diana有一个有趣的结果.

using System;
using System.Collections.Generic;
using System.Linq;

public class Customer{
    public string Name {get;set;}
}

Stopwatch watch = new Stopwatch();        
    const string diana = "Diana";

    while (Console.ReadKey().Key != ConsoleKey.Escape)
    {
        //Armour with 1000k++ customers. Wow, should be a product with a great success! :)
        var customers = (from i in Enumerable.Range(0, 1000000)
                         select new Customer
                         {
                            Name = Guid.NewGuid().ToString()
                         }).ToList();

        customers.Insert(999000, new Customer { Name = diana }); // Putting Diana at the end :)

        //1. System.Linq.Enumerable.DefaultOrFirst()
        watch.Restart();
        customers.FirstOrDefault(c => c.Name == diana);
        watch.Stop();
        Console.WriteLine("Diana was found in {0} ms with System.Linq.Enumerable.FirstOrDefault().", watch.ElapsedMilliseconds);

        //2. System.Collections.Generic.List<T>.Find()
        watch.Restart();
        customers.Find(c => c.Name == diana);
        watch.Stop();
        Console.WriteLine("Diana was found in {0} ms with System.Collections.Generic.List<T>.Find().", watch.ElapsedMilliseconds);
    }

Run Code Online (Sandbox Code Playgroud)

在此输入图像描述

这是因为List.Find()中没有Enumerator开销,或者加上其他可能的东西？

Find()运行速度几乎快两倍.希望.Net团队不会将其标记为未来过时.

Answer 1

dev*_*rts 100

我能模仿你的结果,所以我反编译的程序和存在的差异Find和FirstOrDefault.

首先,这是反编译的程序.我使您的数据对象成为一个非常简单的数据项,仅用于编译

    List<\u003C\u003Ef__AnonymousType0<string>> source = Enumerable.ToList(Enumerable.Select(Enumerable.Range(0, 1000000), i =>
    {
      var local_0 = new
      {
        Name = Guid.NewGuid().ToString()
      };
      return local_0;
    }));
    source.Insert(999000, new
    {
      Name = diana
    });
    stopwatch.Restart();
    Enumerable.FirstOrDefault(source, c => c.Name == diana);
    stopwatch.Stop();
    Console.WriteLine("Diana was found in {0} ms with System.Linq.Enumerable.FirstOrDefault().", (object) stopwatch.ElapsedMilliseconds);
    stopwatch.Restart();
    source.Find(c => c.Name == diana);
    stopwatch.Stop();
    Console.WriteLine("Diana was found in {0} ms with System.Collections.Generic.List<T>.Find().", (object) stopwatch.ElapsedMilliseconds);

Run Code Online (Sandbox Code Playgroud)

这里需要注意的关键FirstOrDefault是调用它,Enumerable而Find在源列表中调用它作为方法.

那么,发现了什么？这是反编译的Find方法

private T[] _items;

[__DynamicallyInvokable]
public T Find(Predicate<T> match)
{
  if (match == null)
    ThrowHelper.ThrowArgumentNullException(ExceptionArgument.match);
  for (int index = 0; index < this._size; ++index)
  {
    if (match(this._items[index]))
      return this._items[index];
  }
  return default (T);
}

Run Code Online (Sandbox Code Playgroud)

因此,它会迭代一个有意义的项目数组,因为列表是数组的包装器.

但是,FirstOrDefault在Enumerable类上,用于foreach迭代项.这将使用迭代器到列表并接下来移动.我认为你看到的是迭代器的开销

[__DynamicallyInvokable]
public static TSource FirstOrDefault<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
  if (source == null)
    throw Error.ArgumentNull("source");
  if (predicate == null)
    throw Error.ArgumentNull("predicate");
  foreach (TSource source1 in source)
  {
    if (predicate(source1))
      return source1;
  }
  return default (TSource);
}

Run Code Online (Sandbox Code Playgroud)

Foreach只是使用可枚举模式的合成糖.看看这张图片

在此输入图像描述 .

我点击了foreach,看看它在做什么,你可以看到dotpeek想带我进入有意义的枚举器/当前/下一个实现.

除此之外,它们基本相同(测试传入的谓词以查看项目是否是您想要的)

它现在100%显而易见,它们之间的唯一区别是什么,我希望看到别的东西,比如更难辨别.看看.net框架下发生了什么事总是很有趣.谢谢! (5认同)
为了帮助阐明性能差异，列表上的“Find()”不使用 LINQ。请参阅@Chris Sinclair 的回答。 (2认同)

Answer 2

Chr*_*air 24

我FirstOrDefault正在通过IEnumerable实现运行,也就是说,它将使用标准foreach循环来进行检查.List<T>.Find()不是LINQ的(部分http://msdn.microsoft.com/en-us/library/x0b5b5bc.aspx),并且可能使用标准for从循环0到Count(或直接或许在其内操作的另一快速的内部机构/包裹数组).通过消除枚举的开销(并进行版本检查以确保列表未被修改),该Find方法更快.

如果你添加第三个测试:

//3. System.Collections.Generic.List<T> foreach
Func<Customer, bool> dianaCheck = c => c.Name == diana;
watch.Restart();
foreach(var c in customers)
{
    if (dianaCheck(c))
        break;
}
watch.Stop();
Console.WriteLine("Diana was found in {0} ms with System.Collections.Generic.List<T> foreach.", watch.ElapsedMilliseconds);

Run Code Online (Sandbox Code Playgroud)

它的运行速度与第一个相同(25ms vs 27ms FirstOrDefault)

编辑:如果我添加一个数组循环,它非常接近Find()速度,并给出@devshorts偷看源代码,我认为这是:

//4. System.Collections.Generic.List<T> for loop
var customersArray = customers.ToArray();
watch.Restart();
int customersCount = customersArray.Length;
for (int i = 0; i < customersCount; i++)
{
    if (dianaCheck(customers[i]))
        break;
}
watch.Stop();
Console.WriteLine("Diana was found in {0} ms with an array for loop.", watch.ElapsedMilliseconds);

Run Code Online (Sandbox Code Playgroud)

这比该Find()方法慢了5.5%.

所以底线:循环遍历数组元素比处理foreach迭代开销更快.(但两者各有利弊/缺点,所以才选择什么有意义的代码逻辑.而且,很少会在速度上的微小差异日益引起问题,所以只使用什么有意义的可维护性/可读性)

@ChrisSinclar甚至是一个更好的算法O(对不起):)我对另一条评论感到更惊讶,当时他说单声道需要176毫秒.而且只有最简单的单一属性类.甚至10k真正的客户在拥有1000个并发客户端的服务器上运行会发生什么(我们经常处理类似的情况)？这就是Linq,lambda,委托,迭代器,枚举,反射和其他成语的成本,使我们的生活更容易用C#. (3认同)

归档时间：	13 年，1 月前
查看次数：	59018 次
最近记录：	7 年，9 月前