mbi*_*ard 154 .net collections comparison equality
我想比较两个集合(在C#中),但我不确定有效实现它的最佳方法.
我已经阅读了关于Enumerable.SequenceEqual的其他帖子,但这并不是我正在寻找的.
在我的情况下,如果它们都包含相同的项目(无论顺序),则两个集合将是相等的.
例:
collection1 = {1, 2, 3, 4};
collection2 = {2, 4, 1, 3};
collection1 == collection2; // true
Run Code Online (Sandbox Code Playgroud)
我通常做的是遍历一个集合中的每个项目,看看它是否存在于另一个集合中,然后循环遍历另一个集合的每个项目,看它是否存在于第一个集合中.(我首先比较长度).
if (collection1.Count != collection2.Count)
return false; // the collections are not equal
foreach (Item item in collection1)
{
if (!collection2.Contains(item))
return false; // the collections are not equal
}
foreach (Item item in collection2)
{
if (!collection1.Contains(item))
return false; // the collections are not equal
}
return true; // the collections are equal
Run Code Online (Sandbox Code Playgroud)
但是,这并不完全正确,并且它可能不是比较两个集合的最有效方法.
我能想到的一个例子是错误的:
collection1 = {1, 2, 3, 3, 4}
collection2 = {1, 2, 2, 3, 4}
Run Code Online (Sandbox Code Playgroud)
哪个与我的实施相同.我应该只计算每个项目的找到次数,并确保两个集合中的计数相等吗?
这些例子在某种C#中(让我们称之为伪C#),但是用你想要的任何语言给出你的答案,这没关系.
注意:为简单起见,我在示例中使用了整数,但我希望能够使用引用类型对象(它们作为键不能正常运行,因为只比较了对象的引用,而不是内容).
Oha*_*der 110
事实证明,微软已经在其测试框架中涵盖了这一点:CollectionAssert.AreEquivalent
备注
如果两个集合具有相同数量的相同元素,则它们是等效的,但是以任何顺序排列.如果元素的值相等,则元素相等,而不是它们引用相同的对象.
使用反射器,我修改了AreEquivalent()后面的代码来创建相应的相等比较器.它比现有的答案更完整,因为它考虑了空值,实现IEqualityComparer并具有一些效率和边缘案例检查.加上,这是微软 :)
public class MultiSetComparer<T> : IEqualityComparer<IEnumerable<T>>
{
private readonly IEqualityComparer<T> m_comparer;
public MultiSetComparer(IEqualityComparer<T> comparer = null)
{
m_comparer = comparer ?? EqualityComparer<T>.Default;
}
public bool Equals(IEnumerable<T> first, IEnumerable<T> second)
{
if (first == null)
return second == null;
if (second == null)
return false;
if (ReferenceEquals(first, second))
return true;
if (first is ICollection<T> firstCollection && second is ICollection<T> secondCollection)
{
if (firstCollection.Count != secondCollection.Count)
return false;
if (firstCollection.Count == 0)
return true;
}
return !HaveMismatchedElement(first, second);
}
private bool HaveMismatchedElement(IEnumerable<T> first, IEnumerable<T> second)
{
int firstNullCount;
int secondNullCount;
var firstElementCounts = GetElementCounts(first, out firstNullCount);
var secondElementCounts = GetElementCounts(second, out secondNullCount);
if (firstNullCount != secondNullCount || firstElementCounts.Count != secondElementCounts.Count)
return true;
foreach (var kvp in firstElementCounts)
{
var firstElementCount = kvp.Value;
int secondElementCount;
secondElementCounts.TryGetValue(kvp.Key, out secondElementCount);
if (firstElementCount != secondElementCount)
return true;
}
return false;
}
private Dictionary<T, int> GetElementCounts(IEnumerable<T> enumerable, out int nullCount)
{
var dictionary = new Dictionary<T, int>(m_comparer);
nullCount = 0;
foreach (T element in enumerable)
{
if (element == null)
{
nullCount++;
}
else
{
int num;
dictionary.TryGetValue(element, out num);
num++;
dictionary[element] = num;
}
}
return dictionary;
}
public int GetHashCode(IEnumerable<T> enumerable)
{
if (enumerable == null) throw new ArgumentNullException(nameof(enumerable));
int hash = 17;
foreach (T val in enumerable.OrderBy(x => x))
hash = hash * 23 + (val?.GetHashCode() ?? 42);
return hash;
}
}
Run Code Online (Sandbox Code Playgroud)
样品用法:
var set = new HashSet<IEnumerable<int>>(new[] {new[]{1,2,3}}, new MultiSetComparer<int>());
Console.WriteLine(set.Contains(new [] {3,2,1})); //true
Console.WriteLine(set.Contains(new [] {1, 2, 3, 3})); //false
Run Code Online (Sandbox Code Playgroud)
或者,如果您只想直接比较两个集合:
var comp = new MultiSetComparer<string>();
Console.WriteLine(comp.Equals(new[] {"a","b","c"}, new[] {"a","c","b"})); //true
Console.WriteLine(comp.Equals(new[] {"a","b","c"}, new[] {"a","b"})); //false
Run Code Online (Sandbox Code Playgroud)
最后,您可以使用您选择的相等比较器:
var strcomp = new MultiSetComparer<string>(StringComparer.OrdinalIgnoreCase);
Console.WriteLine(strcomp.Equals(new[] {"a", "b"}, new []{"B", "A"})); //true
Run Code Online (Sandbox Code Playgroud)
小智 93
一个简单而有效的解决方案是对两个集合进行排序,然后将它们进行相等性比较:
bool equal = collection1.OrderBy(i => i).SequenceEqual(
collection2.OrderBy(i => i));
Run Code Online (Sandbox Code Playgroud)
该算法为O(N*logN),而上述解决方案为O(N ^ 2).
如果集合具有某些属性,您可以实现更快的解决方案.例如,如果两个集合都是哈希集,则它们不能包含重复项.此外,检查哈希集是否包含某个元素非常快.在这种情况下,类似于您的算法可能会最快.
Dan*_*ngs 31
创建一个字典"dict",然后为第一个集合中的每个成员创建dict [member] ++;
然后,以相同的方式循环遍历第二个集合,但是对于每个成员执行dict [member] - .
最后,循环遍历字典中的所有成员:
private bool SetEqual (List<int> left, List<int> right) {
if (left.Count != right.Count)
return false;
Dictionary<int, int> dict = new Dictionary<int, int>();
foreach (int member in left) {
if (dict.ContainsKey(member) == false)
dict[member] = 1;
else
dict[member]++;
}
foreach (int member in right) {
if (dict.ContainsKey(member) == false)
return false;
else
dict[member]--;
}
foreach (KeyValuePair<int, int> kvp in dict) {
if (kvp.Value != 0)
return false;
}
return true;
}
Run Code Online (Sandbox Code Playgroud)
编辑:据我所知,这与最有效的算法顺序相同.假设Dictionary使用O(1)查找,该算法为O(N).
mbi*_*ard 18
这是我(受D.Jennings影响很大)比较方法的通用实现(在C#中):
/// <summary>
/// Represents a service used to compare two collections for equality.
/// </summary>
/// <typeparam name="T">The type of the items in the collections.</typeparam>
public class CollectionComparer<T>
{
/// <summary>
/// Compares the content of two collections for equality.
/// </summary>
/// <param name="foo">The first collection.</param>
/// <param name="bar">The second collection.</param>
/// <returns>True if both collections have the same content, false otherwise.</returns>
public bool Execute(ICollection<T> foo, ICollection<T> bar)
{
// Declare a dictionary to count the occurence of the items in the collection
Dictionary<T, int> itemCounts = new Dictionary<T,int>();
// Increase the count for each occurence of the item in the first collection
foreach (T item in foo)
{
if (itemCounts.ContainsKey(item))
{
itemCounts[item]++;
}
else
{
itemCounts[item] = 1;
}
}
// Wrap the keys in a searchable list
List<T> keys = new List<T>(itemCounts.Keys);
// Decrease the count for each occurence of the item in the second collection
foreach (T item in bar)
{
// Try to find a key for the item
// The keys of a dictionary are compared by reference, so we have to
// find the original key that is equivalent to the "item"
// You may want to override ".Equals" to define what it means for
// two "T" objects to be equal
T key = keys.Find(
delegate(T listKey)
{
return listKey.Equals(item);
});
// Check if a key was found
if(key != null)
{
itemCounts[key]--;
}
else
{
// There was no occurence of this item in the first collection, thus the collections are not equal
return false;
}
}
// The count of each item should be 0 if the contents of the collections are equal
foreach (int value in itemCounts.Values)
{
if (value != 0)
{
return false;
}
}
// The collections are equal
return true;
}
}
Run Code Online (Sandbox Code Playgroud)
static bool SetsContainSameElements<T>(IEnumerable<T> set1, IEnumerable<T> set2) {
var setXOR = new HashSet<T>(set1);
setXOR.SymmetricExceptWith(set2);
return (setXOR.Count == 0);
}
Run Code Online (Sandbox Code Playgroud)
解决方案需要 .NET 3.5 和System.Collections.Generic命名空间。根据微软的说法,SymmetricExceptWith是一个O(n + m)操作,其中n表示第一组中的元素数,m表示第二组中的元素数。如有必要,您始终可以向此函数添加相等比较器。
小智 5
编辑:我意识到,一旦我提出这真的只适用于集合 - 它将无法正确处理具有重复项目的集合.例如,从该算法的角度来看,{1,1,2}和{2,2,1}将被认为是相等的.但是,如果您的集合是集合(或者它们的相等性可以通过这种方式衡量),我希望您能找到以下有用的集合.
我使用的解决方案是:
return c1.Count == c2.Count && c1.Intersect(c2).Count() == c1.Count;
Run Code Online (Sandbox Code Playgroud)
Linq做了字典下的事情,所以这也是O(N).(注意,如果集合的大小不同,则为O(1)).
我使用Daniel建议的"SetEqual"方法,Igor建议的OrderBy/SequenceEquals方法以及我的建议进行了健全性检查.结果如下,显示Igor的O(N*LogN)和我和Daniel的O(N).
我认为Linq交叉代码的简单性使其成为首选解决方案.
__Test Latency(ms)__
N, SetEquals, OrderBy, Intersect
1024, 0, 0, 0
2048, 0, 0, 0
4096, 31.2468, 0, 0
8192, 62.4936, 0, 0
16384, 156.234, 15.6234, 0
32768, 312.468, 15.6234, 46.8702
65536, 640.5594, 46.8702, 31.2468
131072, 1312.3656, 93.7404, 203.1042
262144, 3765.2394, 187.4808, 187.4808
524288, 5718.1644, 374.9616, 406.2084
1048576, 11420.7054, 734.2998, 718.6764
2097152, 35090.1564, 1515.4698, 1484.223
Run Code Online (Sandbox Code Playgroud)
在没有重复且没有顺序的情况下,以下EqualityComparer可用于允许集合作为字典键:
public class SetComparer<T> : IEqualityComparer<IEnumerable<T>>
where T:IComparable<T>
{
public bool Equals(IEnumerable<T> first, IEnumerable<T> second)
{
if (first == second)
return true;
if ((first == null) || (second == null))
return false;
return first.ToHashSet().SetEquals(second);
}
public int GetHashCode(IEnumerable<T> enumerable)
{
int hash = 17;
foreach (T val in enumerable.OrderBy(x => x))
hash = hash * 23 + val.GetHashCode();
return hash;
}
}
Run Code Online (Sandbox Code Playgroud)
这是我使用的ToHashSet()实现.该散列码算法来自有效的Java(由乔恩飞碟双向的方式).
如果您使用Shouldly,则可以将ShouldAllBe与Contains一起使用。
collection1 = {1, 2, 3, 4};
collection2 = {2, 4, 1, 3};
collection1.ShouldAllBe(item=>collection2.Contains(item)); // true
Run Code Online (Sandbox Code Playgroud)
最后,您可以编写一个扩展。
public static class ShouldlyIEnumerableExtensions
{
public static void ShouldEquivalentTo<T>(this IEnumerable<T> list, IEnumerable<T> equivalent)
{
list.ShouldAllBe(l => equivalent.Contains(l));
}
}
Run Code Online (Sandbox Code Playgroud)
更新
ShouldBe方法上存在一个可选参数。
collection1 = {1, 2, 3, 4};
collection2 = {2, 4, 1, 3};
collection1.ShouldAllBe(item=>collection2.Contains(item)); // true
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
90904 次 |
| 最近记录: |