在大型List中查找具有相同属性的对象 - 性能缓慢

flo*_*flo 4 c# list

我有一个大List<MyClass>的对象,60万左右 MyClass有像10个特性,让我们说property1,property2等.直到property10.

在该列表中,我想获得一个List<MyClass>包含某些属性具有相同值的对象的List .

这意味着例如,对象,其中property2,property4,property8property10是相同的.

最好的方法是什么?目前我对我做了一个循环List<MyClass>,并在该循环中通过List<MyClass>.FindAll()虚拟代码得到所有类似的对象:

forach(var item in myClassList)
{
   if(!found.Contains(item))
   {
      var similarObjects = myClassList.FindAll(x => x.property2 == item.property2 && x.property4 == item.property4 && x.property8 == item.property8 && x.property10 == item.property10);

      //adding the objects to the "already found" list
      foreach(var foundItem in similarOjbects)
      {
         found.Add(foundItem);
      }

     if(similarObjects.Count > 1)
     {
        similarObjectsList.Add(similarObjects);
     }
   }
}
Run Code Online (Sandbox Code Playgroud)

但这需要很长时间,List.FindAll()方法很慢.

有没有更有效的算法呢?

Mat*_*son 5

您可以group by非常有效地解决这个问题:

var grouped =
    from item in myClassList
    group item 
    by new {item.Property2, item.Property4, item.Property8, item.Property10};
Run Code Online (Sandbox Code Playgroud)

这将为您提供一系列组,其中每个组包含具有指定属性的相同值的所有对象.

例如,要迭代生成的组序列的每个组中的每个项目,您可以执行以下操作:

foreach (var group in grouped)
{
    foreach (var item in group)
    {
        // Do something with item
    }
}
Run Code Online (Sandbox Code Playgroud)

请注意,这假定每个属性的类型实现IEquatable<T>GetHashCode().

这是一个可编辑的例子:

using System;
using System.Collections.Generic;
using System.Linq;

namespace Demo
{
    class Data
    {
        public string Name { get; set; }
        public int Property1  { get; set; }
        public int Property2  { get; set; }
        public int Property3  { get; set; }
        public int Property4  { get; set; }
        public int Property5  { get; set; }
        public int Property6  { get; set; }
        public int Property7  { get; set; }
        public int Property8  { get; set; }
        public int Property9  { get; set; }
        public int Property10 { get; set; }
    }

    class Program
    {
        static void Main(string[] args)
        {
            List<Data> myClassList = new List<Data>
            {
                new Data {Name = "1A", Property2 = 1, Property4 = 1, Property8 = 1, Property10 = 1},
                new Data {Name = "1B", Property2 = 1, Property4 = 1, Property8 = 1, Property10 = 1},
                new Data {Name = "1C", Property2 = 1, Property4 = 1, Property8 = 1, Property10 = 1},
                new Data {Name = "2A", Property2 = 2, Property4 = 2, Property8 = 2, Property10 = 2},
                new Data {Name = "2B", Property2 = 2, Property4 = 2, Property8 = 2, Property10 = 2},
                new Data {Name = "2C", Property2 = 2, Property4 = 2, Property8 = 2, Property10 = 2},
                new Data {Name = "3A", Property2 = 3, Property4 = 3, Property8 = 3, Property10 = 3},
                new Data {Name = "3B", Property2 = 3, Property4 = 3, Property8 = 3, Property10 = 3},
                new Data {Name = "3C", Property2 = 3, Property4 = 3, Property8 = 3, Property10 = 3},
            };

            var grouped =
                from item in myClassList
                group item 
                by new {item.Property2, item.Property4, item.Property8, item.Property10};

            foreach (var group in grouped)
            {
                Console.WriteLine(string.Join(", ", group.Select(item => item.Name)));
            }
        }
    }
}
Run Code Online (Sandbox Code Playgroud)

上面的例子输出:

1A, 1B, 1C
2A, 2B, 2C
3A, 3B, 3C
Run Code Online (Sandbox Code Playgroud)

使用PLINQ可能的优化

正如下面的@BertPersyn所提到的,你可以使用PLINQ加快速度.

要做到这一点,只需使用以下内容生成grouped(注意添加.AsParallel()):

var grouped = 
    from item in myClassList.AsParallel()
    group item 
    by new {item.Property2, item.Property4, item.Property8, item.Property10};
Run Code Online (Sandbox Code Playgroud)

要确定这是否真的加快了速度,必须执行一些计时.