new KeyValuePair <UInt32,UInt32>(i,j).GetHashCode(); 高重复率

pap*_*zzo 13 .net dictionary gethashcode

为了寻找词典的快速复合键,我遇到异常,我无法理解也无法证明.

在有限的测试中

Dictionary<KeyValuePair<UInt32, UInt32>, string>
Run Code Online (Sandbox Code Playgroud)

明显慢于(200:1)

Dictionary<KeyValuePair<UInt16, UInt16>, string>
Run Code Online (Sandbox Code Playgroud)

测试两个循环,从0到1000 Populate,然后包含ContainsKey

         Poplulate     ContainsKey  
UInt32    92085         86578  
UInt16     2201           431
Run Code Online (Sandbox Code Playgroud)

问题是

new KeyValuePair<UInt32, UInt32>(i, j).GetHashCode();
Run Code Online (Sandbox Code Playgroud)

产生许多重复.
在循环i和j 1024中,仅创建1024个唯一散列值.

基于来自CasperOne的雪崩评论尝试了i*31和j*97(两个素数),这导致105280在1024X1024上独一无二.仍然有很多重复.CasperOne我知道这与随机不一样.但随机输入并不是我的工作.GetHashCode()应该随机化输出.

为什么重复次数很多?

相同的循环

new KeyValuePair<UInt16, UInt16>(i, j).GetHashCode();
Run Code Online (Sandbox Code Playgroud)

产生1024 X 1024唯一哈希码(完美).

Int32有同样的问题.

这些重复哈希值会终止

Dictionary<KeyValuePair<UInt32, UInt32>, string> 
Run Code Online (Sandbox Code Playgroud)

与Int16相比,元组还会生成很多重复项,在Int32中不会降级.

生成原始KVP和原始KPV.GetHashCode的时间类似.

与HashSet相同的异常.

Dictionary<KeyValuePair<UInt32, UInt32>, string> dKVPu32 = new Dictionary<KeyValuePair<UInt32, UInt32>, string>();
Dictionary<KeyValuePair<UInt16, UInt16>, string> dKVPu16 = new Dictionary<KeyValuePair<UInt16, UInt16>, string>();
KeyValuePair<UInt32, UInt32> kvpUint32;
KeyValuePair<UInt16, UInt16> kvpUint16;
int range = 1000;
Int32 hashCode;
HashSet<Int32> kvpUint32Hash = new HashSet<Int32>();
HashSet<Int32> kvpUint16Hash = new HashSet<Int32>();

Stopwatch sw = new Stopwatch();
sw.Start();
for (UInt32 i = 0; i < range; i++)
{
    for (UInt32 j = 0; j < range; j++)
    {
        kvpUint32 = new KeyValuePair<UInt32, UInt32>(i, j);
    }
}
Console.WriteLine("UInt32  raw " + sw.ElapsedMilliseconds.ToString());
//  7
sw.Restart();
for (UInt16 i = 0; i < range; i++)
{
    for (UInt16 j = 0; j < range; j++)
    {
        kvpUint16 = new KeyValuePair<UInt16, UInt16>(i, j);
    }
}
Console.WriteLine("UInt16  raw " + sw.ElapsedMilliseconds.ToString());
//  6
sw.Restart();
for (UInt32 i = 0; i < range; i++)
{
    for (UInt32 j = 0; j < range; j++)
    {
        hashCode = new KeyValuePair<UInt32, UInt32>(i, j).GetHashCode();
        kvpUint32Hash.Add(hashCode);
    }
}
Console.WriteLine("UInt32  GetHashCode " + sw.ElapsedMilliseconds.ToString() + "  unique count " + kvpUint32Hash.Count.ToString());
//  285   1024
sw.Restart();
for (UInt16 i = 0; i < range; i++)
{
    for (UInt16 j = 0; j < range; j++)
    {
        hashCode = new KeyValuePair<UInt16, UInt16>(i, j).GetHashCode();
        kvpUint16Hash.Add(hashCode);
    }
}
Console.WriteLine("UInt16  GetHashCode " + sw.ElapsedMilliseconds.ToString() + "  unique count " + kvpUint16Hash.Count.ToString());
//  398 1000000
sw.Restart();
Console.ReadLine();
for (UInt32 i = 0; i < range; i++)
{
    for (UInt32 j = 0; j < range; j++)
    {
        dKVPu32.Add(new KeyValuePair<UInt32, UInt32>(i, j), String.Format("{0} {1}", i.ToString(), j.ToString()));
    }
}
Console.WriteLine("hsKVPu32 pop " + sw.ElapsedMilliseconds.ToString());
//  92085
sw.Restart();
for (UInt32 i = 0; i < range; i++)
{
    for (UInt32 j = 0; j < range; j++)
    {
        if (!dKVPu32.ContainsKey(new KeyValuePair<UInt32, UInt32>(i, j))) Debug.WriteLine("Opps"); ;
    }
}
Console.WriteLine("hsKVPu32 find " + sw.ElapsedMilliseconds.ToString());
//  86578
dKVPu32.Clear();
dKVPu32 = null;
GC.Collect();
sw.Restart();
for (UInt16 i = 0; i < range; i++)
{
    for (UInt16 j = 0; j < range; j++)
    {
        dKVPu16.Add(new KeyValuePair<UInt16, UInt16>(i, j), String.Format("{0} {1}", i.ToString(), j.ToString()));
    }
}
Console.WriteLine("hsKVPu16 pop " + sw.ElapsedMilliseconds.ToString());
//   2201
sw.Restart();
for (UInt16 i = 0; i < range; i++)
{
    for (UInt16 j = 0; j < range; j++)
    {
        if (!dKVPu16.ContainsKey(new KeyValuePair<UInt16, UInt16>(i, j))) Debug.WriteLine("Opps"); ;
    }
}
sw.Stop();
Console.WriteLine("hsKVPu16 find " + sw.ElapsedMilliseconds.ToString());
//  431
Run Code Online (Sandbox Code Playgroud)

PS最快的是打包.EG((UInt32)int1 << 16)| INT2;

第一个UInt32列的哈希值等于接下来两个的KVP哈希值.

2281371105 8 992
2281371104 8 993
2281371107 8 994

2281371145 0 0
2281371147 0 2
2281371149 0 4
2281371151 0 6
2281371137 0 8

2281371144 0 1
2281371146 0 3
2281371148 0 5
2281371150 0 7
2281371136 0 9

2281371144 1 0
2281371145 1 1
2281371146 1 2
2281371147 1 3
2281371148 1 4
2281371149 1 5
2281371150 1 6
2281371151 1 7
2281371136 1 8
2281371137 1 9

2281371147 2 0
2281371146 2 1
2281371144 2 3
2281371151 2 4
2281371150 2 5
2281371149 2 6
2281371148 2 7
2281371139 2 8

我发现的唯一模式是总和或差异或KVP匹配.
但无法找到何时求和以及何时减去的模式.
这是一个糟糕的哈希,所以知道它是什么没什么价值.

Dav*_*eau 8

由于GetHashCode返回a Int32,每对Int16s(或UInt16s)都可以轻松返回唯一值.使用一对Int32s,您需要以某种方式组合这些值以与您的设计兼容.

KeyValuePair不会覆盖GetHashCode(),因此您只是使用默认实现ValueType.GetHashCode(),并且其文档说明如下:

(来自:http://msdn.microsoft.com/en-us/library/system.valuetype.gethashcode.aspx)

如果调用派生类型的GetHashCode方法,则返回值可能不适合用作哈希表中的键.此外,如果这些字段中的一个或多个字段的值发生更改,则返回值可能不适合用作哈希表中的键.在任何一种情况下,请考虑编写自己的GetHashCode方法实现,该方法更接近地表示该类型的哈希代码的概念.

由于KeyValuePair没有覆盖GetHashCode(),我认为它不打算用作Dictionary密钥.

此外,根据这个问题这个C#代码,默认实现ValueType.GetHashCode()只是选择第一个非静态字段,并返回其GetHashCode()方法的结果.这解释了大量的重复KeyValuePair<UInt32, UInt32>,虽然它没有解释缺少重复KeyValuePair<UInt16, UInt16>.

我的猜测是,对KeyValuePair<UInt32, UInt32>,GetHashCode()不只是返回GetHashCode()的第一个值,并且对于KeyValuePair<UInt16, UInt16>,GetHashCode()被合并从而为每对值一个唯一的哈希值,因为它是可能的,straightfoward这样做.


Jon*_*eet 7

首先,我们可以省去这方面的时间方面 - 我觉得这真的只是哈希冲突,因为显然那些会破坏性能.

所以,真正的问题是,为什么有更多的哈希冲突KeyValuePair<uint, uint>KeyValuePair<ushort, ushort>.为了帮助我找到更多相关信息,我编写了以下简短程序:

using System;
using System.Collections.Generic;

class Program
{
    const int Sample1 = 100;
    const int Sample2 = 213;

    public static void Main()
    {
        Display<uint, ushort>();
        Display<ushort, ushort>();
        Display<uint, uint>();
        Display<ushort, uint>();
    }

    static void Display<TKey, TValue>()
    {
        TKey key1 = (TKey) Convert.ChangeType(Sample1, typeof(TKey));
        TValue value1 = (TValue) Convert.ChangeType(Sample1, typeof(TValue));
        TKey key2 = (TKey) Convert.ChangeType(Sample2, typeof(TKey));
        TValue value2 = (TValue) Convert.ChangeType(Sample2, typeof(TValue));

        Console.WriteLine("Testing {0}, {1}", typeof(TKey).Name, typeof(TValue).Name);
        Console.WriteLine(new KeyValuePair<TKey, TValue>(key1, value1).GetHashCode());
        Console.WriteLine(new KeyValuePair<TKey, TValue>(key1, value2).GetHashCode());
        Console.WriteLine(new KeyValuePair<TKey, TValue>(key2, value1).GetHashCode());
        Console.WriteLine(new KeyValuePair<TKey, TValue>(key2, value2).GetHashCode());
        Console.WriteLine();
    }
}
Run Code Online (Sandbox Code Playgroud)

我机器上的输出是:

Testing UInt32, UInt16
-1888265981
-1888265981
-1888265806
-1888265806

Testing UInt16, UInt16
-466800447
-459525951
-466800528
-459526032

Testing UInt32, UInt32
958334947
958334802
958334802
958334947

Testing UInt16, UInt32
-1913331935
-1913331935
-1913331935
-1913331935
Run Code Online (Sandbox Code Playgroud)

显然,您可以尝试更改样本值以查看存在冲突的位置.

结果KeyValuePair<ushort, uint>特别令人担忧,结果KeyValuePair<ushort, ushort>令人惊讶地好.

其实,KeyValuePair<ushort, uint>不只是坏-这是可笑糟糕,因为据我可以看到-我还没有找到任何运行64位CLR时不具有-1913331935相同的散列码值.运行32位CLR我得到一个不同的哈希码,但所有值仍然是相同的哈希码.

似乎在.NET 4.5(我正在运行的)中,默认实现GetHashCode不仅仅采用结构的第一个实例字段,如前所述.我怀疑至少对于某些类型,它只使用了盒装值中标题之外的前4个字节的内存(并且这里将有每次调用的装箱),并且最终有时只是第一个字段(如果是字段是a uint),有时是多个字段(例如ushort, ushort,两个字段都可以适合"内部"4个字节),有时根本不是字段(ushort, uint).

(实际上,这并不能解释为什么在这种uint, uint情况下你得到1024个不同的哈希码而不是1000个.我仍然不确定.)

最终,使用不会覆盖GetHashCode作为字典键的值类型似乎只是一个坏主意,除非您已经过测试以确保它适合您的特定要求.IMO,有太多的黑魔法对它充满信心.