C# fast 除了排序列表的方法

Question

C# fast 除了排序列表的方法

我正在开发一个瓶颈为 list1.Except(list2) 的应用程序。来自这篇文章：在 Linq 中处理 HashSet 时，我应该使用“除外”或“包含”，但“除外”的复杂度为 O(m+n)（m 和 n 代表列表的大小）。但是，我的列表已排序。这有帮助吗？

我能想到的第一个实现：

foreach element in list2 (m operations)
    look for it in list1 (ln(n) operations)
    if present 
        set it to null (O(1), removing has a O(n))
    else continue

Run Code Online (Sandbox Code Playgroud)

它的复杂度为 O(m*ln(n))，这在 m 小而 n 大时非常有趣（这正是我的数据集的情况：m 大约为 50，n 大约为 1 000 000）。然而，它产生空值的事实可能对使用它的函数有很多影响......有没有办法保持这种复杂性，而不必编写空值（然后跟踪它们）

任何帮助将不胜感激！

Answer 1

Lou*_*cci 2

using System;
using System.Collections.Generic;

public class Test
{
    public static void Main()
    {
        var listM = new List<int>();
        var listN = new List<int>();
        for(int i = 0, x = 0; x < 50; i+=13, x++) { 
            listM.Add(i);
        }
        for(int i = 0, x = 0; x < 10000; i+=7, x++) { 
            listN.Add(i);
        }
        Console.WriteLine(SortedExcept(listM, listN).Count);
    }

    public static List<T> SortedExcept<T>(List<T> m, List<T> n) {
        var result = new List<T>();
        foreach(var itm in m) {
            var index = n.BinarySearch(itm);
            if(index < 0) { 
                result.Add(itm); 
            }
        }
        return result;
    }
}

Run Code Online (Sandbox Code Playgroud)

编辑这也是 O(M + N) 版本

public static List<T> SortedExcept2<T>(List<T> m, List<T> n) where T : IComparable<T> {
    var result = new List<T>();
    int i = 0, j = 0;
    if(n.Count == 0) {
        result.AddRange(m);
        return result;
    }
    while(i < m.Count) {
        if(m[i].CompareTo(n[j]) < 0) {
            result.Add(m[i]);
            i++;
        } else if(m[i].CompareTo(n[j]) > 0) {
            j++;
        } else {
            i++;
        }
        if(j >= n.Count) {
            for(; i < m.Count; i++) { 
                result.Add(m[i]); 
            }
            break;
        }
    }
    return result;
}

Run Code Online (Sandbox Code Playgroud)

在快速和肮脏的基准测试中 http://ideone.com/Y2oEQD M + N 总是更快，即使 N 是 1000 万。BinarySearch 会受到惩罚，因为它以非线性方式访问数组内存；这会导致缓存未命中，从而减慢算法速度，因此 N 越大，BinarySearch 的内存访问惩罚就越多。

归档时间：	10 年，6 月前
查看次数：	1360 次
最近记录：	10 年，6 月前