C#LINQ在List中查找重复项

Mir*_*ese 293 linq list duplicate-removal

使用LINQ,从a List<int>,如何检索包含重复多次的条目及其值的列表?

Sav*_*ave 496

解决问题的最简单方法是根据元素的值对元素进行分组,然后如果组中有多个元素,则选择组的代表.在LINQ中,这转换为:

var query = lst.GroupBy(x => x)
              .Where(g => g.Count() > 1)
              .Select(y => y.Key)
              .ToList();
Run Code Online (Sandbox Code Playgroud)

如果您想知道元素重复的次数,您可以使用:

var query = lst.GroupBy(x => x)
              .Where(g => g.Count() > 1)
              .Select(y => new { Element = y.Key, Counter = y.Count() })
              .ToList();
Run Code Online (Sandbox Code Playgroud)

这将返回一个List匿名类型,每个元素都会有属性ElementCounter,以获取你需要的信息.

最后,如果它是您正在寻找的字典,您可以使用

var query = lst.GroupBy(x => x)
              .Where(g => g.Count() > 1)
              .ToDictionary(x => x.Key, y => y.Count());
Run Code Online (Sandbox Code Playgroud)

这将返回一个字典,以元素为键,以及它作为值重复的次数.

  • 如果使用 Skip(1).Any() 而不是 Count() 来检查任何集合是否有多个元素更有效。想象一个包含 1000 个元素的集合。Skip(1).Any() 一旦找到第二个元素,就会检测到超过 1 个元素。使用 Count() 需要访问完整的集合。 (4认同)

max*_*oin 117

找出可枚举是否包含任何重复:

var anyDuplicate = enumerable.GroupBy(x => x.Key).Any(g => g.Count() > 1);
Run Code Online (Sandbox Code Playgroud)

找出可枚举中的所有值是否都是唯一的:

var allUnique = enumerable.GroupBy(x => x.Key).All(g => g.Count() == 1);
Run Code Online (Sandbox Code Playgroud)

  • 要获取重复的内容,只需将 Any 更改为Where 即可。 (5认同)
  • 有没有可能这些并不总是布尔相反的?在所有情况下,anyDuplicate == !allUnique。 (3认同)
  • @GarrGodfrey 他们总是布尔对立的 (3认同)

HuB*_*eZa 19

另一种方法是使用HashSet:

var hash = new HashSet<int>();
var duplicates = list.Where(i => !hash.Add(i));
Run Code Online (Sandbox Code Playgroud)

如果您需要重复列表中的唯一值:

var myhash = new HashSet<int>();
var mylist = new List<int>(){1,1,2,2,3,3,3,4,4,4};
var duplicates = mylist.Where(item => !myhash.Add(item)).ToList().Distinct().ToList();
Run Code Online (Sandbox Code Playgroud)

这是与通用扩展方法相同的解决方案:

public static class Extensions
{
  public static IEnumerable<TSource> GetDuplicates<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> selector, IEqualityComparer<TKey> comparer)
  {
    var hash = new HashSet<TKey>(comparer);
    return source.Where(item => !hash.Add(selector(item))).ToList();
  }

  public static IEnumerable<TSource> GetDuplicates<TSource>(this IEnumerable<TSource> source, IEqualityComparer<TSource> comparer)
  {
    return source.GetDuplicates(x => x, comparer);      
  }

  public static IEnumerable<TSource> GetDuplicates<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> selector)
  {
    return source.GetDuplicates(selector, null);
  }

  public static IEnumerable<TSource> GetDuplicates<TSource>(this IEnumerable<TSource> source)
  {
    return source.GetDuplicates(x => x, null);
  }
}
Run Code Online (Sandbox Code Playgroud)


Ale*_*man 10

你可以这样做:

var list = new[] {1,2,3,1,4,2};
var duplicateItems = list.Duplicates();
Run Code Online (Sandbox Code Playgroud)

使用这些扩展方法:

public static class Extensions
{
    public static IEnumerable<TSource> Duplicates<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> selector)
    {
        var grouped = source.GroupBy(selector);
        var moreThan1 = grouped.Where(i => i.IsMultiple());
        return moreThan1.SelectMany(i => i);
    }

    public static IEnumerable<TSource> Duplicates<TSource, TKey>(this IEnumerable<TSource> source)
    {
        return source.Duplicates(i => i);
    }

    public static bool IsMultiple<T>(this IEnumerable<T> source)
    {
        var enumerator = source.GetEnumerator();
        return enumerator.MoveNext() && enumerator.MoveNext();
    }
}
Run Code Online (Sandbox Code Playgroud)

在Duplicates方法中使用IsMultiple()比Count()更快,因为这不会迭代整个集合.

  • 如果您查看[分组参考源](http://referencesource.microsoft.com/System.Core/System/Linq/Enumerable.cs.html#2177),您可以看到 `Count()` ** 是** 是预先计算的,您的解决方案可能会更慢。 (2认同)
  • @RehanKhan:IsMultiple 没有执行 Count(),它在 2 个项目后立即停止。就像 Take(2).Count &gt;= 2; (2认同)

小智 10

仅查找重复值:

var duplicates = list.GroupBy(x => x.Key).Any(g => g.Count() > 1);
Run Code Online (Sandbox Code Playgroud)

例如

var list = new[] {1,2,3,1,4,2};
Run Code Online (Sandbox Code Playgroud)

GroupBy将按它们的键对数字进行分组,并用它维护计数(重复的次数)。之后,我们只是检查重复多次的值。

仅查找唯一值:

var unique = list.GroupBy(x => x.Key).All(g => g.Count() == 1);
Run Code Online (Sandbox Code Playgroud)

例如

var list = new[] {1,2,3,1,4,2};
Run Code Online (Sandbox Code Playgroud)

GroupBy将按它们的键对数字进行分组,并用它维护计数(重复的次数)。之后,我们只是检查只重复一次的值是否唯一。

  • 这两个例子都只返回布尔值,这不是OP所要求的。 (2认同)

Ric*_*edo 6

我创建了一个扩展响应,你可以在你的项目中包含它,我认为当你在List或Linq中搜索重复项时,这会返回最多的情况.

例:

//Dummy class to compare in list
public class Person
{
    public int Id { get; set; }
    public string Name { get; set; }
    public string Surname { get; set; }
    public Person(int id, string name, string surname)
    {
        this.Id = id;
        this.Name = name;
        this.Surname = surname;
    }
}


//The extention static class
public static class Extention
{
    public static IEnumerable<T> getMoreThanOnceRepeated<T>(this IEnumerable<T> extList, Func<T, object> groupProps) where T : class
    { //Return only the second and next reptition
        return extList
            .GroupBy(groupProps)
            .SelectMany(z => z.Skip(1)); //Skip the first occur and return all the others that repeats
    }
    public static IEnumerable<T> getAllRepeated<T>(this IEnumerable<T> extList, Func<T, object> groupProps) where T : class
    {
        //Get All the lines that has repeating
        return extList
            .GroupBy(groupProps)
            .Where(z => z.Count() > 1) //Filter only the distinct one
            .SelectMany(z => z);//All in where has to be retuned
    }
}

//how to use it:
void DuplicateExample()
{
    //Populate List
    List<Person> PersonsLst = new List<Person>(){
    new Person(1,"Ricardo","Figueiredo"), //fist Duplicate to the example
    new Person(2,"Ana","Figueiredo"),
    new Person(3,"Ricardo","Figueiredo"),//second Duplicate to the example
    new Person(4,"Margarida","Figueiredo"),
    new Person(5,"Ricardo","Figueiredo")//third Duplicate to the example
    };

    Console.WriteLine("All:");
    PersonsLst.ForEach(z => Console.WriteLine("{0} -> {1} {2}", z.Id, z.Name, z.Surname));
    /* OUTPUT:
        All:
        1 -> Ricardo Figueiredo
        2 -> Ana Figueiredo
        3 -> Ricardo Figueiredo
        4 -> Margarida Figueiredo
        5 -> Ricardo Figueiredo
        */

    Console.WriteLine("All lines with repeated data");
    PersonsLst.getAllRepeated(z => new { z.Name, z.Surname })
        .ToList()
        .ForEach(z => Console.WriteLine("{0} -> {1} {2}", z.Id, z.Name, z.Surname));
    /* OUTPUT:
        All lines with repeated data
        1 -> Ricardo Figueiredo
        3 -> Ricardo Figueiredo
        5 -> Ricardo Figueiredo
        */
    Console.WriteLine("Only Repeated more than once");
    PersonsLst.getMoreThanOnceRepeated(z => new { z.Name, z.Surname })
        .ToList()
        .ForEach(z => Console.WriteLine("{0} -> {1} {2}", z.Id, z.Name, z.Surname));
    /* OUTPUT:
        Only Repeated more than once
        3 -> Ricardo Figueiredo
        5 -> Ricardo Figueiredo
        */
}
Run Code Online (Sandbox Code Playgroud)