在列表列表中查找重复项

Question

在列表列表中查找重复项

简单的情况.我有一个列表列表,几乎像表格一样,我试图找出是否有任何列表重复.

例:

List<List<int>> list = new List<List<int>>(){
  new List<int>() {0 ,1 ,2, 3, 4, 5, 6 },
  new List<int>() {0 ,1 ,2, 3, 4, 5, 6 },
  new List<int>() {0 ,1 ,4, 2, 4, 5, 6 },
  new List<int>() {0 ,3 ,2, 5, 1, 6, 4 }
};

Run Code Online (Sandbox Code Playgroud)

我想知道总共有4个项目,其中2个是重复项目.我正在考虑做类似SQL校验和的事情,但我不知道是否有更好/更简单的方法.

我关心性能,我关心订购.

可能有用的其他信息

插入此列表的内容永远不会被删除
不受任何特定集合的约束.
不关心功能签名
它们的类型不限于int

Answer 1

And*_*rey 6

让我们试着获得最佳表现.如果n是列表的数量而m是列表的长度,那么我们可以得到O(n m + n logn + n)加上一些哈希码的概率对于不同的列表是相等的.

主要步骤:

计算哈希码*
排序他们
浏览列表以找到欺骗

*这是重要的一步.对于simlicity你可以将hash hash as = ... ^(list [i] << i)^(list [i + 1] <<(i + 1))

编辑那些认为PLINQ可以提升这个东西的人,但不是好的algorythm.PLINQ也可以在这里添加,因为所有步骤都可以轻松并行化.

我的代码:

static public void Main()
{
    List<List<int>> list = new List<List<int>>(){
      new List<int>() {0 ,1 ,2, 3, 4, 5, 6 },
      new List<int>() {0 ,1 ,2, 3, 4, 5, 6 },
      new List<int>() {0 ,1 ,4, 2, 4, 5, 6 },
      new List<int>() {0 ,3 ,2, 5, 1, 6, 4 }
    };
    var hashList = list.Select((l, ind) =>
    {
        uint hash = 0;
        for (int i = 0; i < l.Count; i++)
        {
            uint el = (uint)l[i];
            hash ^= (el << i) | (el >> (32 - i));
        }
        return new {hash, ind};
    }).OrderBy(l => l.hash).ToList();
    //hashList.Sort();
    uint prevHash = hashList[0].hash;
    int firstInd = 0;            
    for (int i = 1; i <= hashList.Count; i++)
    {
        if (i == hashList.Count || hashList[i].hash != prevHash)
        {
            for (int n = firstInd; n < i; n++)
                for (int m = n + 1; m < i; m++)
                {
                    List<int> x = list[hashList[n].ind];
                    List<int> y = list[hashList[m].ind];
                    if (x.Count == y.Count && x.SequenceEqual(y))
                        Console.WriteLine("Dupes: {0} and {1}", hashList[n].ind, hashList[m].ind);
                }                    
        }
        if (i == hashList.Count)
            break;
        if (hashList[i].hash != prevHash)
        {
            firstInd = i;
            prevHash = hashList[i].hash;
        }
    }
}

Run Code Online (Sandbox Code Playgroud)

归档时间：	15 年，10 月前
查看次数：	714 次
最近记录：	15 年，10 月前