当我组合2个哈希集时,HashSet.Union vs 之间的区别是什么 HashSet.Unionwith.
我想这样组合:
HashSet<EngineType> enginesSupportAll = _filePolicyEvaluation.EnginesSupportAll;
enginesSupportAll = enginesSupportAll != null ? new HashSet<EngineType>(engines.Union(enginesSupportAll)) : enginesSupportAll;
Run Code Online (Sandbox Code Playgroud)
这个例子的最佳方法是什么?为什么?
Tim*_*ter 14
好吧,它不是HashSet.Union,但是Enumerable.Union,让你使用的是与任何类型的工作的LINQ扩展方法IEnumerable<>,而HashSet.UnionWith是真正的HashSet是modifes当前实例方法.
Union 返回一个 IEnumerable<TSource>UnionWith是void,它修改当前HashSet实例UnionWith稍微更高效,因为它可以被优化如果您不希望在方法中支持任何类型的序列,那么HashSet修复它并且您可以修改它,使用它,否则使用LINQ扩展.如果你HashSet为此目的创建实例它并不重要,我希望LINQ更灵活,并能够链接我的查询.
Kas*_*erg 14
鉴于一个和乙有四种方法来一个 ∪ 乙:HashSet<T> HashSet<T>
new HashSet<t>(A.Union(B))HashSet<T&>(IEnumerable<T>)和
Enumerable.Union<T>(IEnumerable<T>, IEnumerable<T>))A.UnionWith(B)HashSet<T> C = new HashSet<T>(A); C.UnionWith(B);new HashSet<t>(A.Concat(B))Enumerable.Concat<T>(IEnumerable<T>, IEnumerable<T>))每个都有其优点和缺点:
HashSet而2和3是语句或语句块的表达式.from x in setofSetsA as IEnumerable<HashSet<T>>
from y in setOfSetsB as IEnumerable<HashSet<T>>
select x.UnionWith(y)因为UnionWith返回void 将无法工作.计算成本:
A.UnionWith(B)
(≈O((log(|A∪B|) - log(| A |))*|A∪B|)+ O(| B |))
≤
HashSet<T> C = new HashSet<T>(A); C.UnionWith(B);
(≈O((log(|A∪B|) - log(| A |))*|A∪B|)+ O(| A | + | B |))
≤
HashSet<T>(A.Concat(B))
(≈O(log(|A∪B|)*|A∪B|)+ O(| A | + | B |))
≤
HashSet<T>(A.Union(B))
(≈2*O(log(|A∪B|)*|A∪B|)+ O(| A | + | B | + |A∪B|))
下一节将深入研究参考源,以查看这些性能估计的基础.
HashSet<T>在联合选项1,3和4中,构造函数HashSet<T>(IEnumerable<T>, IEqualityComparer<T>)用于创建一个HashSet<T>from IEnumerable<T>.如果传递IEnumerable<T>的Count属性为 -ie,如果它是ICollection<T>- ,则此属性用于设置新的大小HashSet:
Run Code Online (Sandbox Code Playgroud)int suggestedCapacity = 0; ICollection<T> coll = collection as ICollection<T>; if (coll != null) { suggestedCapacity = coll.Count; } Initialize(suggestedCapacity);
[Count()][10]永远不会调用该方法.因此,如果IEnumerable可以毫不费力地检索计数,则用于预留容量; 否则,HashSet在添加新元素时增长并重新分配.
因此,在选项1 A.Union(B)和选项4 A.Concat(B)中ICollection<T>,创建的HashSet将不会增长并重新分配一些(≈log(|A∪B|))次.选项3可以使用Count的甲.
构造函数调用UnionWith以填充新的空HashSet:
Run Code Online (Sandbox Code Playgroud)this.UnionWith(collection);
UnionWith(IEnumerable<T>)迭代IEnumerable<T>传递的参数中的元素并调用AddIfNotPresent(T)每个元素.
AddIfNotPresent(T)插入元素并确保重复项永远不会插入到集合中.
HashSet<T>实现为一个插槽阵列m_slots,以及一个桶阵列m_buckets.存储桶只包含数组的int索引m_slots.每个桶的SlotS IN m_slots形式链表与索引到下一个Slot中m_slots.
AddIfNotPresent(T) 跳转到正确的存储桶,然后遍历其链接列表以检查该元素是否已存在:
Run Code Online (Sandbox Code Playgroud)for (int i = m_buckets[hashCode % m_buckets.Length] - 1; i >= 0; i = m_slots[i].next) { if (m_slots[i].hashCode == hashCode && m_comparer.Equals(m_slots[i].value, value)) { return false; } }
接下来找到一个空闲索引并保留一个插槽.首先m_freelist,检查空闲插槽列表.当空闲列表中没有插槽时,将m_slots使用阵列中的下一个空插槽.IncreaseCapacity()如果空闲列表中没有插槽且没有空插槽,则保留更多容量(via ):
Run Code Online (Sandbox Code Playgroud)int index; if (m_freeList >= 0) { index = m_freeList; m_freeList = m_slots[index].next; } else { if (m_lastIndex == m_slots.Length) { IncreaseCapacity(); // this will change during resize bucket = hashCode % m_buckets.Length; } index = m_lastIndex; m_lastIndex++; }
AddIfNotPresent(T)有三个操作需要一些计算:调用object.GetHashCode(),object.Equals(object)在发生碰撞时调用,以及IncreaseCapacity().实际添加元素只会产生设置一些指针和一些整数的成本.
当容量HashSet<T>需求IncreaseCapacity()至少翻倍时.因此,我们可以得出结论,平均a HashSet<T>填充了75%.如果散列均匀分布,则哈希冲突的预期也为75%.
SetCapacity(int, bool),被称为IncreaseCapacity(),是最昂贵的:它分配新的数组,它将旧的插槽阵列复制到新的数组,并重新计算存储桶列表:
Run Code Online (Sandbox Code Playgroud)Slot[] newSlots = new Slot[newSize]; if (m_slots != null) { Array.Copy(m_slots, 0, newSlots, 0, m_lastIndex); } ... int[] newBuckets = new int[newSize]; for (int i = 0; i < m_lastIndex; i++) { int bucket = newSlots[i].hashCode % newSize; newSlots[i].next = newBuckets[bucket] - 1; newBuckets[bucket] = i + 1; } m_slots = newSlots; m_buckets = newBuckets;
选项1和4(new HashSet<T>(A.Union(B)))将导致稍多的调用IncreaseCapacity().没有成本A.Union(B)或A.Concat(B)- 的成本约为O(log(|A∪B|)*|A∪B|).
当使用选项2(A.UnionWith(B))或选项3(HashSet<T> C = new HashSet<T>(A); C.UnionWith(B))时,我们在成本上得到log(| A |)的"折扣":O((log(|A∪B|) - log(| A |))*| A∪B|).它(稍微)支付使用最大的集合作为目标进入另一个被合并的女巫.
Enumerable<T>.Union(IEnumerable<T>)Enumerable<T>.Union(IEnumerable<T>)通过实现UnionIterator<T>(IEnumerable<T>, IEnumerable<T>, IEqualityComparer<T>).
在UnionIterator使用Set<T>-an内部类Enumerable.cs-这是非常相似的HashSet<T>. UnionIterator懒惰地Add(T)š从物品甲和乙这Set<T>和yields元素如果它们可以被添加.完成的工作Find(T, bool)类似于HashSet<T>.AddIfNotPresent(T).检查元素是否已存在:
Run Code Online (Sandbox Code Playgroud)int hashCode = InternalGetHashCode(value); for (int i = buckets[hashCode % buckets.Length] - 1; i >= 0; i = slots[i].next) { if (slots[i].hashCode == hashCode && comparer.Equals(slots[i].value, value)) return true; }
找一个免费索引并保留一个插槽:
Run Code Online (Sandbox Code Playgroud)int index; if (freeList >= 0) { index = freeList; freeList = slots[index].next; } else { if (count == slots.Length) Resize(); index = count; count++; } int bucket = hashCode % buckets.Length; slots[index].hashCode = hashCode; slots[index].value = value; slots[index].next = buckets[bucket] - 1; buckets[bucket] = index + 1;
Resize()类似于IncreaseCapacity().两者之间的最大区别在于Resize()不使用素数作为桶的数量,因此如果不好GetHashCode()则碰撞的可能性稍高.代码Resize():
Run Code Online (Sandbox Code Playgroud)int newSize = checked(count * 2 + 1); int[] newBuckets = new int[newSize]; Slot[] newSlots = new Slot[newSize]; Array.Copy(slots, 0, newSlots, 0, count); for (int i = 0; i < count; i++) { int bucket = newSlots[i].hashCode % newSize; newSlots[i].next = newBuckets[bucket] - 1; newBuckets[bucket] = i + 1; } buckets = newBuckets; slots = newSlots;
性能成本与之A.Union(B)没有显着差异HashSet<T> C = new HashSet<T>(); C.UnionWith(A); C.UnionWith(B);.在选项1(new HashSet<T>(A.Union(B)))中,HashSet创建两次相同的结果导致非常昂贵的2*O(log(|A∪B|)*(|A∪B|)).选项从知道如何4个结果HashSet<T>(IEnumerable<T>)和Enumerable.Union(IEnumerable<T>, IEnumerable<T>)实现.它避免了冗余A.Union(B)导致O(log(|A∪B|)*|A∪B|)的成本.