IEnumerable <T>上的扩展方法:性能如何?

Vim*_*987 5 .net linq performance

来自我的导师:首选本地方法(直接在集合上实现)而不是IEnumerable的扩展方法,因为:

LINQ-to-Objects扩展方法在IEnumerable上实现,这意味着在最坏的情况下(当您搜索的项目不存在于集合中时),您将必须枚举所有元素.如果你有一个直接在集合上实现的Contains或Exists方法,它可以利用内部知识,也许只是做一个哈希表查找或其他一些快速操作.

我非常困惑,因为我认为微软应该已经为IEnumerable Contains/Exists实现了哈希表.List和IEnumerable的快速基准显示没有差异:

static void Main(string[] args)
{
    Console.Write("input the number of elements: ");
    int count = Convert.ToInt32(Console.ReadLine());
    Console.Write("input the number of loops: ");
    int loop = Convert.ToInt32(Console.ReadLine());

    Random r = new Random();

    Stopwatch sw = new Stopwatch();
    for (int i = 0; i < loop; i++)
    {
        var list = CreateListOfInt(count);
        sw.Start();
        for (int j = 0; j < count; j++)
        {
            DoContains(list, r.Next());
        }
        sw.Stop();
    }

    Console.WriteLine("List<T> native method: Iterated {0} times on {1} elements, elapsed :{2}",loop,count,sw.Elapsed);

    sw.Reset();
    for (int i = 0; i < loop; i++)
    {
        var list = CreateListOfInt(count);
        sw.Start();
        for (int j = 0; j < count; j++)
        {
            DoContainsEnumerable(list, r.Next());
        }
        sw.Stop();
    }

    Console.WriteLine("IEnumerable<T> extension method: Iterated {0} times on {1} elements, elapsed :{2}", loop, count, sw.Elapsed);

    sw.Reset();
    for (int i = 0; i < loop; i++)
    {
        var list = CreateListOfInt2(count);
        sw.Start();
        for (int j = 0; j < count; j++)
        {
            //make sure that the element is not in the list
            DoContains(list, r.Next(20000, 50000));
        }
        sw.Stop();
    }
    Console.WriteLine("List<T> native method: element does not exist:Iterated {0} times on {1} elements, elapsed :{2}", loop, count, sw.Elapsed);

    sw.Reset();
    for (int i = 0; i < loop; i++)
    {
        var list = CreateListOfInt2(count);
        sw.Start();
        for (int j = 0; j < count; j++)
        {
            //make sure that the element is not in the list
            DoContainsEnumerable(list, r.Next(20000, 50000));
        }
        sw.Stop();
    }
    Console.WriteLine("IEnumerable<T> extension method: element does not exist: Iterated {0} times on {1} elements, elapsed :{2}", loop, count, sw.Elapsed);


    Console.ReadKey();
}

static List<int> CreateListOfInt(int count)
{
    Random r = new Random(1000);
    List<int> numbers = new List<int>(count);
    for (int i = 0; i < count; i++)
    {
        numbers.Add(r.Next());
    }
    return numbers;
}

static bool DoContains(List<int> list, int number)
{
    return list.Contains(number);
}

static bool DoContainsEnumerable(IEnumerable<int> list, int number)
{
    return list.Contains(number);
}


//define the scope of randomly created number, to make sure that lookup number will not in the List
static List<int> CreateListOfInt2(int count)
{
    Random r = new Random(1000);
    List<int> numbers = new List<int>(count);
    for (int i = 0; i < count; i++)
    {
        numbers.Add(r.Next(0,10000));
    }
    return numbers;
}
Run Code Online (Sandbox Code Playgroud)

}

编辑:我尝试了HashSet实现,这大大提高了性能:

  sw.Reset();
            for (int i = 0; i < loop; i++)
            {
                var list = CreateListOfInt2(count);
                HashSet<int> hashtable = new HashSet<int>(list);
                sw.Start();
                for (int j = 0; j < count; j++)
                {
                    //make sure that the element is not in the list
                    hashtable.Contains(r.Next(20000, 50000));
                }
                sw.Stop();
            }
            Console.WriteLine("IEnumerable<T> extension method: element does not exist: Iterated {0} times on {1} elements, elapsed :{2}", loop, count, sw.Elapsed);
Run Code Online (Sandbox Code Playgroud)

不过,你对我的导师说的意见是什么?

任何人都可以帮我清理一下吗?我的导师是对的吗?如果他是对的,我的代码出了什么问题?

非常感谢你

dle*_*lev 4

List<T> Contains调用只是迭代列表,因此它们不会比扩展方法更快。如果您使用 aHashSet<T>并尝试一系列Contains()操作,您会发现明显的改进。

编辑:微软没有对IEnumerable<T>扩展方法使用哈希的原因是他们无法保证实现类使用哈希或类似的东西。他们不得不采用简单的方法,因为IEnumerable<T>接口仅保证枚举实现类。