我有一个大List<MyClass>的对象,60万左右
MyClass有像10个特性,让我们说property1,property2等.直到property10.
在该列表中,我想获得一个List<MyClass>包含某些属性具有相同值的对象的List .
这意味着例如,对象,其中property2,property4,property8和property10是相同的.
最好的方法是什么?目前我对我做了一个循环List<MyClass>,并在该循环中通过List<MyClass>.FindAll()虚拟代码得到所有类似的对象:
forach(var item in myClassList)
{
if(!found.Contains(item))
{
var similarObjects = myClassList.FindAll(x => x.property2 == item.property2 && x.property4 == item.property4 && x.property8 == item.property8 && x.property10 == item.property10);
//adding the objects to the "already found" list
foreach(var foundItem in similarOjbects)
{
found.Add(foundItem);
}
if(similarObjects.Count > 1)
{
similarObjectsList.Add(similarObjects);
}
}
}
Run Code Online (Sandbox Code Playgroud)
但这需要很长时间,List.FindAll()方法很慢.
有没有更有效的算法呢?
您可以group by非常有效地解决这个问题:
var grouped =
from item in myClassList
group item
by new {item.Property2, item.Property4, item.Property8, item.Property10};
Run Code Online (Sandbox Code Playgroud)
这将为您提供一系列组,其中每个组包含具有指定属性的相同值的所有对象.
例如,要迭代生成的组序列的每个组中的每个项目,您可以执行以下操作:
foreach (var group in grouped)
{
foreach (var item in group)
{
// Do something with item
}
}
Run Code Online (Sandbox Code Playgroud)
请注意,这假定每个属性的类型实现IEquatable<T>和GetHashCode().
这是一个可编辑的例子:
using System;
using System.Collections.Generic;
using System.Linq;
namespace Demo
{
class Data
{
public string Name { get; set; }
public int Property1 { get; set; }
public int Property2 { get; set; }
public int Property3 { get; set; }
public int Property4 { get; set; }
public int Property5 { get; set; }
public int Property6 { get; set; }
public int Property7 { get; set; }
public int Property8 { get; set; }
public int Property9 { get; set; }
public int Property10 { get; set; }
}
class Program
{
static void Main(string[] args)
{
List<Data> myClassList = new List<Data>
{
new Data {Name = "1A", Property2 = 1, Property4 = 1, Property8 = 1, Property10 = 1},
new Data {Name = "1B", Property2 = 1, Property4 = 1, Property8 = 1, Property10 = 1},
new Data {Name = "1C", Property2 = 1, Property4 = 1, Property8 = 1, Property10 = 1},
new Data {Name = "2A", Property2 = 2, Property4 = 2, Property8 = 2, Property10 = 2},
new Data {Name = "2B", Property2 = 2, Property4 = 2, Property8 = 2, Property10 = 2},
new Data {Name = "2C", Property2 = 2, Property4 = 2, Property8 = 2, Property10 = 2},
new Data {Name = "3A", Property2 = 3, Property4 = 3, Property8 = 3, Property10 = 3},
new Data {Name = "3B", Property2 = 3, Property4 = 3, Property8 = 3, Property10 = 3},
new Data {Name = "3C", Property2 = 3, Property4 = 3, Property8 = 3, Property10 = 3},
};
var grouped =
from item in myClassList
group item
by new {item.Property2, item.Property4, item.Property8, item.Property10};
foreach (var group in grouped)
{
Console.WriteLine(string.Join(", ", group.Select(item => item.Name)));
}
}
}
}
Run Code Online (Sandbox Code Playgroud)
上面的例子输出:
1A, 1B, 1C
2A, 2B, 2C
3A, 3B, 3C
Run Code Online (Sandbox Code Playgroud)
使用PLINQ可能的优化
正如下面的@BertPersyn所提到的,你可以使用PLINQ加快速度.
要做到这一点,只需使用以下内容生成grouped(注意添加.AsParallel()):
var grouped =
from item in myClassList.AsParallel()
group item
by new {item.Property2, item.Property4, item.Property8, item.Property10};
Run Code Online (Sandbox Code Playgroud)
要确定这是否真的加快了速度,必须执行一些计时.