为什么具有可空值的结构的HashSets非常慢?

Kob*_*obi 69 .net c# performance struct

我研究了性能下降并将其跟踪以减缓HashSets的速度.
我有可用值作为主键的结构.例如:

public struct NullableLongWrapper
{
    private readonly long? _value;

    public NullableLongWrapper(long? value)
    {
        _value = value;
    }
}
Run Code Online (Sandbox Code Playgroud)

我注意到创建一个HashSet<NullableLongWrapper>异常缓慢.

以下是使用BenchmarkDotNet的示例:( Install-Package BenchmarkDotNet)

using System.Collections.Generic;
using System.Linq;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Configs;
using BenchmarkDotNet.Jobs;
using BenchmarkDotNet.Running;

public class Program
{
    static void Main()
    {
        BenchmarkRunner.Run<HashSets>();
    }
}

public class Config : ManualConfig
{
    public Config()
    {
        Add(Job.Dry.WithWarmupCount(1).WithLaunchCount(3).WithTargetCount(20));
    }
}

public struct NullableLongWrapper
{
    private readonly long? _value;

    public NullableLongWrapper(long? value)
    {
        _value = value;
    }

    public long? Value => _value;
}

public struct LongWrapper
{
    private readonly long _value;

    public LongWrapper(long value)
    {
        _value = value;
    }

    public long Value => _value;
}

[Config(typeof (Config))]
public class HashSets
{
    private const int ListSize = 1000;

    private readonly List<long?> _nullables;
    private readonly List<long> _longs;
    private readonly List<NullableLongWrapper> _nullableWrappers;
    private readonly List<LongWrapper> _wrappers;

    public HashSets()
    {
        _nullables = Enumerable.Range(1, ListSize).Select(i => (long?) i).ToList();
        _longs = Enumerable.Range(1, ListSize).Select(i => (long) i).ToList();
        _nullableWrappers = Enumerable.Range(1, ListSize).Select(i => new NullableLongWrapper(i)).ToList();
        _wrappers = Enumerable.Range(1, ListSize).Select(i => new LongWrapper(i)).ToList();
    }

    [Benchmark]
    public void Longs() => new HashSet<long>(_longs);

    [Benchmark]
    public void NullableLongs() => new HashSet<long?>(_nullables);

    [Benchmark(Baseline = true)]
    public void Wrappers() => new HashSet<LongWrapper>(_wrappers);

    [Benchmark]
    public void NullableWrappers() => new HashSet<NullableLongWrapper>(_nullableWrappers);
}
Run Code Online (Sandbox Code Playgroud)

结果:

           Method |          Median |   Scaled
----------------- |---------------- |---------
            Longs |      22.8682 us |     0.42
    NullableLongs |      39.0337 us |     0.62
         Wrappers |      62.8877 us |     1.00
 NullableWrappers | 231,993.7278 us | 3,540.34

使用一个结构与一个结构Nullable<long>相比较的结构long是3540倍!
就我而言,它在800毫秒和<1毫秒之间产生了差异.

以下是BenchmarkDotNet的环境信息:

OS = Microsoft Windows NT 6.1.7601 Service Pack 1
Processor = Intel(R)Core(TM)i7-5600U CPU 2.60GHz,ProcessorCount = 4
Frequency = 2536269 ticks,Resolution = 394.2799 ns,Timer = TSC
CLR = MS.NET 4.0 .30319.42000,Arch = 64位RELEASE [RyuJIT]
GC =并发工作站
JitModules = clrjit-v4.6.1076.0

表现差的原因是什么?

Mat*_*son 86

发生这种情况是因为每个元素_nullableWrappers都返回了相同的哈希码GetHashCode(),这导致哈希退化为O(N)访问而不是O(1).

您可以通过打印出所有哈希码来验证这一点.

如果您修改结构如下:

public struct NullableLongWrapper
{
    private readonly long? _value;

    public NullableLongWrapper(long? value)
    {
        _value = value;
    }

    public override int GetHashCode()
    {
        return _value.GetHashCode();
    }

    public long? Value => _value;
}
Run Code Online (Sandbox Code Playgroud)

它工作得更快.

现在,显而易见的问题是WHY是每个NullableLongWrapper相同的哈希码.

在这个帖子中讨论了答案.然而,它并没有完全回答这个问题,因为Hans的答案围绕着有两个字段的结构,在计算哈希码时可以从中选择 - 但是在这段代码中,只有一个字段可供选择 - 而且它是一个值类型(a struct).

然而,这个故事的寓意是:永远不要依赖于GetHashCode()价值类型的默认值!


附录

我想也许正在发生的事情与汉斯在我链接的帖子中的答案有关 - 也许它是在结构中取第一个字段(bool)的值Nullable<T>,而我的实验表明它可能是相关的 - 但它是复杂:

考虑这段代码及其输出:

using System;

public class Program
{
    static void Main()
    {
        var a = new Test {A = 0, B = 0};
        var b = new Test {A = 1, B = 0};
        var c = new Test {A = 0, B = 1};
        var d = new Test {A = 0, B = 2};
        var e = new Test {A = 0, B = 3};

        Console.WriteLine(a.GetHashCode());
        Console.WriteLine(b.GetHashCode());
        Console.WriteLine(c.GetHashCode());
        Console.WriteLine(d.GetHashCode());
        Console.WriteLine(e.GetHashCode());
    }
}

public struct Test
{
    public int A;
    public int B;
}

Output:

346948956
346948957
346948957
346948958
346948959
Run Code Online (Sandbox Code Playgroud)

注意第二个和第三个哈希码(1/0和0/1)是如何相同的,但其他哈希码都是不同的.我发现这很奇怪,因为明显改变A会改变哈希码,就像改变B一样,但是给定两个值X和Y,为A = X,B = Y和A = Y,B = X生成相同的哈希码.

(听起来有些XOR的东西正在幕后发生,但那是猜测.)

顺便提一下,可以显示BOTH字段对哈希代码有贡献的这种行为证明参考源中的注释ValueType.GetHashType()是不准确或错误的:

行动:我们返回哈希码的算法有点复杂.我们寻找第一个非静态字段并获取它的哈希码.如果类型没有非静态字段,我们返回该类型的哈希码.我们不能获取静态成员的哈希码,因为如果该成员与原始类型的类型相同,我们将最终处于无限循环中.

如果该评论为真,则上述示例中的五个哈希码中的四个将是相同的,因为A对于所有这些哈希码具有相同的值0.(假设A是第一个字段,但如果交换值,则会得到相同的结果:两个字段都明显有助于哈希码.)

然后我尝试将第一个字段更改为bool:

using System;

public class Program
{
    static void Main()
    {
        var a = new Test {A = false, B = 0};
        var b = new Test {A = true,  B = 0};
        var c = new Test {A = false, B = 1};
        var d = new Test {A = false, B = 2};
        var e = new Test {A = false, B = 3};

        Console.WriteLine(a.GetHashCode());
        Console.WriteLine(b.GetHashCode());
        Console.WriteLine(c.GetHashCode());
        Console.WriteLine(d.GetHashCode());
        Console.WriteLine(e.GetHashCode());
    }
}

public struct Test
{
    public bool A;
    public int  B;
}

Output

346948956
346948956
346948956
346948956
346948956
Run Code Online (Sandbox Code Playgroud)

哇!因此,无论任何字段的值如何,使第一个字段成为bool使得所有哈希码都相同.

这对我来说仍然是一种错误.

该错误已在.NET 4中修复,但仅适用于Nullable.自定义类型仍会产生不良行为.资源

  • 我太天真了.我相信他们.谢谢! (5认同)

eoc*_*ron 12

这是由于结构GetHashCode()行为.如果找到引用类型 - 它会尝试从第一个非引用类型字段获取哈希值.在你的情况下,它找到了,Nullable <>也是struct,所以它只是poped它的私有布尔值(4个字节)