Func <T>的性能和继承

atl*_*ste 10 .net c# generics performance inheritance

Func<...>在使用继承和泛型时,我一直无法理解在整个代码中使用的性能特征- 这是我发现自己一直使用的组合.

让我从一个最小的测试用例开始,这样我们都知道我们在谈论什么,然后我会发布结果,然后我将解释我期望的内容以及为什么......

最小的测试用例

public class GenericsTest2 : GenericsTest<int> 
{
    static void Main(string[] args)
    {
        GenericsTest2 at = new GenericsTest2();

        at.test(at.func);
        at.test(at.Check);
        at.test(at.func2);
        at.test(at.Check2);
        at.test((a) => a.Equals(default(int)));
        Console.ReadLine();
    }

    public GenericsTest2()
    {
        func = func2 = (a) => Check(a);
    }

    protected Func<int, bool> func2;

    public bool Check2(int value)
    {
        return value.Equals(default(int));
    }

    public void test(Func<int, bool> func)
    {
        using (Stopwatch sw = new Stopwatch((ts) => { Console.WriteLine("Took {0:0.00}s", ts.TotalSeconds); }))
        {
            for (int i = 0; i < 100000000; ++i)
            {
                func(i);
            }
        }
    }
}

public class GenericsTest<T>
{
    public bool Check(T value)
    {
        return value.Equals(default(T));
    }

    protected Func<T, bool> func;
}

public class Stopwatch : IDisposable
{
    public Stopwatch(Action<TimeSpan> act)
    {
        this.act = act;
        this.start = DateTime.UtcNow;
    }

    private Action<TimeSpan> act;
    private DateTime start;

    public void Dispose()
    {
        act(DateTime.UtcNow.Subtract(start));
    }
}
Run Code Online (Sandbox Code Playgroud)

结果

Took 2.50s  -> at.test(at.func);
Took 1.97s  -> at.test(at.Check);
Took 2.48s  -> at.test(at.func2);
Took 0.72s  -> at.test(at.Check2);
Took 0.81s  -> at.test((a) => a.Equals(default(int)));
Run Code Online (Sandbox Code Playgroud)

我期待什么,为什么

我希望这个代码能够以完全相同的速度运行所有5种方法,更精确,甚至比任何一种方法都快,即:

using (Stopwatch sw = new Stopwatch((ts) => { Console.WriteLine("Took {0:0.00}s", ts.TotalSeconds); }))
{
    for (int i = 0; i < 100000000; ++i)
    {
        bool b = i.Equals(default(int));
    }
}
// this takes 0.32s ?!?
Run Code Online (Sandbox Code Playgroud)

我预计它需要0.32s,因为我没有看到任何理由让JIT编译器不在这种特殊情况下内联代码.

经过仔细检查,我根本不理解这些性能数字:

  • at.func传递给函数,在执行期间无法更改.为什么不这样内联?
  • at.Check显然比速度快at.Check2,而两者都不能被覆盖而且在类GenericsTest2的情况下的at.Check就像岩石一样固定
  • 我认为Func<int, bool>在传递内联Func而不是转换为a的方法时没有理由变慢Func
  • 为什么测试案例2和3之间的差异高达0.5秒,而案例4和5之间的差异为0.1秒 - 他们不应该是相同的吗?

我真的很想理解这一点......在这里发生的事情是,使用通用基类的速度比整个版本的内联速度快10倍?

所以,基本上问题是:为什么会发生这种情况,我该如何解决?

UPDATE

基于到目前为止的所有评论(谢谢!)我做了一些挖掘.

首先,在重复测试并使循环扩大5倍并执行4次时获得一组新结果.我使用了Diagnostics秒表并添加了更多测试(添加了说明).

(Baseline implementation took 2.61s)

--- Run 0 ---
Took 3.00s for (a) => at.Check2(a)
Took 12.04s for Check3<int>
Took 12.51s for (a) => GenericsTest2.Check(a)
Took 13.74s for at.func
Took 16.07s for GenericsTest2.Check
Took 12.99s for at.func2
Took 1.47s for at.Check2
Took 2.31s for (a) => a.Equals(default(int))
--- Run 1 ---
Took 3.18s for (a) => at.Check2(a)
Took 13.29s for Check3<int>
Took 14.10s for (a) => GenericsTest2.Check(a)
Took 13.54s for at.func
Took 13.48s for GenericsTest2.Check
Took 13.89s for at.func2
Took 1.94s for at.Check2
Took 2.61s for (a) => a.Equals(default(int))
--- Run 2 ---
Took 3.18s for (a) => at.Check2(a)
Took 12.91s for Check3<int>
Took 15.20s for (a) => GenericsTest2.Check(a)
Took 12.90s for at.func
Took 13.79s for GenericsTest2.Check
Took 14.52s for at.func2
Took 2.02s for at.Check2
Took 2.67s for (a) => a.Equals(default(int))
--- Run 3 ---
Took 3.17s for (a) => at.Check2(a)
Took 12.69s for Check3<int>
Took 13.58s for (a) => GenericsTest2.Check(a)
Took 14.27s for at.func
Took 12.82s for GenericsTest2.Check
Took 14.03s for at.func2
Took 1.32s for at.Check2
Took 1.70s for (a) => a.Equals(default(int))
Run Code Online (Sandbox Code Playgroud)

我从这些结果中注意到,在你开始使用泛型的那一刻,它变慢了.我在非泛型实现中发现了更多的IL:

L_0000: ldarga.s 'value'
L_0002: ldc.i4.0 
L_0003: call instance bool [mscorlib]System.Int32::Equals(int32)
L_0008: ret 
Run Code Online (Sandbox Code Playgroud)

以及所有通用实现:

L_0000: ldarga.s 'value'
L_0002: ldloca.s CS$0$0000
L_0004: initobj !T
L_000a: ldloc.0 
L_000b: box !T
L_0010: constrained. !T
L_0016: callvirt instance bool [mscorlib]System.Object::Equals(object)
L_001b: ret 
Run Code Online (Sandbox Code Playgroud)

虽然大部分可以优化,但我想这callvirt可能是一个问题.

为了使速度更快,我将"T:IEquatable"约束添加到方法的定义中.结果是:

L_0011: callvirt instance bool [mscorlib]System.IEquatable`1<!T>::Equals(!0)
Run Code Online (Sandbox Code Playgroud)

虽然我现在对性能有了更多了解(它可能无法内联,因为它会创建一个vtable查找),但我仍然感到困惑:为什么不简单地调用T :: Equals?毕竟,我确实指出它会在那里......

cit*_*kid 7

运行微基准测试总是3次.第一个将触发JIT并将其排除在外.检查第二次和第三次运行是否相等.这给出了:

... run ...
Took 0.79s
Took 0.63s
Took 0.74s
Took 0.24s
Took 0.32s
... run ...
Took 0.73s
Took 0.63s
Took 0.73s
Took 0.24s
Took 0.33s
... run ...
Took 0.74s
Took 0.63s
Took 0.74s
Took 0.25s
Took 0.33s
Run Code Online (Sandbox Code Playgroud)

这条线

func = func2 = (a) => Check(a);
Run Code Online (Sandbox Code Playgroud)

添加一个额外的函数调用.删除它

func = func2 = this.Check;

得到:

... 1. run ...
Took 0.64s
Took 0.63s
Took 0.63s
Took 0.24s
Took 0.32s
... 2. run ...
Took 0.63s
Took 0.63s
Took 0.63s
Took 0.24s
Took 0.32s
... 3. run ...
Took 0.63s
Took 0.63s
Took 0.63s
Took 0.24s
Took 0.32s
Run Code Online (Sandbox Code Playgroud)

这表明由于删除了函数调用,1.和2. run之间的(JIT?)效果消失了.前3个测试现在是相同的.

在测试4和5中,编译器可以将函数参数内联到void test(Func <>),而在测试1到3中,编译器可能需要很长时间才能确定它们是常量.有时从编码器的角度来看,编译器存在一些不容易看到的限制,例如.Net和Jit约束来自.Net程序的动态特性,而不是来自c ++的二进制文件.无论如何,它是函数arg的内联,在这里产生了不同.

4和5之间的差异?好吧,test5看起来像编译器也很容易内联函数.也许他为闭包构建了一个上下文,并且解决它比需要的复杂一点.没有挖到MSIL弄清楚.

使用.Net 4.5进行上述测试.这里有3.5,证明编译器在内联方面做得更好:

... 1. run ...
Took 1.06s
Took 1.06s
Took 1.06s
Took 0.24s
Took 0.27s
... 2. run ...
Took 1.06s
Took 1.08s
Took 1.06s
Took 0.25s
Took 0.27s
... 3. run ...
Took 1.05s
Took 1.06s
Took 1.05s
Took 0.24s
Took 0.27s
Run Code Online (Sandbox Code Playgroud)

和.Net 4:

... 1. run ...
Took 0.97s
Took 0.97s
Took 0.96s
Took 0.22s
Took 0.30s
... 2. run ...
Took 0.96s
Took 0.96s
Took 0.96s
Took 0.22s
Took 0.30s
... 3. run ...
Took 0.97s
Took 0.96s
Took 0.96s
Took 0.22s
Took 0.30s
Run Code Online (Sandbox Code Playgroud)

现在将GenericTest <>更改为GenericTest !!

... 1. run ...
Took 0.28s
Took 0.24s
Took 0.24s
Took 0.24s
Took 0.27s
... 2. run ...
Took 0.24s
Took 0.24s
Took 0.24s
Took 0.24s
Took 0.27s
... 3. run ...
Took 0.25s
Took 0.25s
Took 0.25s
Took 0.24s
Took 0.27s
Run Code Online (Sandbox Code Playgroud)

这是C#编译器的一个惊喜,类似于我遇到的密封类以避免虚函数调用.也许Eric Lippert对此有所说明?

将继承移除到聚合会带来性能.我学会了永远不要使用继承,非常非常非常,并且强烈建议你至少在这种情况下避免使用它.(这是我对这个问题的务实解决方案,没有预期的火焰战争).我一直使用接口很难,并且它们没有性能损失.

  • 当调用的方法不是虚拟时,callvirt被记录为具有与调用完全相同的语义,除了callvirt在顶部进行空检查.这就是C#编译器生成callvirt而不是调用的原因; 因为它知道它需要对接收器进行空检查.否则,它必须生成空检查,然后调用,这将是更大和更慢的代码. (3认同)
  • @StefandeBruijn:如果没有vtable,callvirt不会进行vtable查找!用callvirt调用静态或实例方法是完全合法的; 抖动会将其转换为非虚拟呼叫,并在顶部进行空检查. (2认同)