可收集的动态程序集中的静态字段访问缺乏性能

Pae*_*els 17 .net c# clr

对于动态二进制翻译模拟器,我需要使用访问静态字段的类生成可收集的.NET程序集.但是,在可收集组件中使用静态字段时,与不可收集的组件相比,执行性能要低2-3倍.在不使用静态场的可收集组件中不存在这种现象.

在下面的代码中MyMethod,抽象类的方法AbstrTest由可收集和不可收集的动态程序集实现.使用CreateTypeConstMyMethod由两个恒定值乘以了ulong参数值,同时使用CreateTypeField所述第二因子是从初始化的静态字段构造截取MyField.

为了获得真实的结果,MyMethod结果在for循环中累积.

以下是测量结果(.NET CLR 4.5/4.6):

Testing non-collectible const multiply:
Elapsed: 8721.2867 ms

Testing collectible const multiply:
Elapsed: 8696.8124 ms

Testing non-collectible field multiply:
Elapsed: 10151.6921 ms

Testing collectible field multiply:
Elapsed: 33404.4878 ms
Run Code Online (Sandbox Code Playgroud)

这是我的复制代码:

using System;
using System.Reflection;
using System.Reflection.Emit;
using System.Diagnostics;

public abstract class AbstrTest {
  public abstract ulong MyMethod(ulong x);
}

public class DerivedClassBuilder {

  private static Type CreateTypeConst(string name, bool collect) {
    // Create an assembly.
    AssemblyName myAssemblyName = new AssemblyName();
    myAssemblyName.Name = name;
    AssemblyBuilder myAssembly = AppDomain.CurrentDomain.DefineDynamicAssembly(
       myAssemblyName, collect ? AssemblyBuilderAccess.RunAndCollect : AssemblyBuilderAccess.Run);

    // Create a dynamic module in Dynamic Assembly.
    ModuleBuilder myModuleBuilder = myAssembly.DefineDynamicModule(name);

    // Define a public class named "MyClass" in the assembly.
    TypeBuilder myTypeBuilder = myModuleBuilder.DefineType("MyClass", TypeAttributes.Public, typeof(AbstrTest));

    // Create the MyMethod method.
    MethodBuilder myMethodBuilder = myTypeBuilder.DefineMethod("MyMethod",
       MethodAttributes.Public | MethodAttributes.ReuseSlot | MethodAttributes.Virtual | MethodAttributes.HideBySig,
       typeof(ulong), new Type [] { typeof(ulong) });
    ILGenerator methodIL = myMethodBuilder.GetILGenerator();
    methodIL.Emit(OpCodes.Ldarg_1);
    methodIL.Emit(OpCodes.Ldc_I4_2);
    methodIL.Emit(OpCodes.Conv_U8);
    methodIL.Emit(OpCodes.Mul);
    methodIL.Emit(OpCodes.Ret);

    return myTypeBuilder.CreateType();
  }

  private static Type CreateTypeField(string name, bool collect) {
    // Create an assembly.
    AssemblyName myAssemblyName = new AssemblyName();
    myAssemblyName.Name = name;
    AssemblyBuilder myAssembly = AppDomain.CurrentDomain.DefineDynamicAssembly(
       myAssemblyName, collect ? AssemblyBuilderAccess.RunAndCollect : AssemblyBuilderAccess.Run);

    // Create a dynamic module in Dynamic Assembly.
    ModuleBuilder myModuleBuilder = myAssembly.DefineDynamicModule(name);

    // Define a public class named "MyClass" in the assembly.
    TypeBuilder myTypeBuilder = myModuleBuilder.DefineType("MyClass", TypeAttributes.Public, typeof(AbstrTest));

    // Define a private String field named "MyField" in the type.
    FieldBuilder myFieldBuilder = myTypeBuilder.DefineField("MyField",
       typeof(ulong), FieldAttributes.Private | FieldAttributes.Static);

    // Create the constructor.
    ConstructorBuilder constructor = myTypeBuilder.DefineConstructor(
       MethodAttributes.Public | MethodAttributes.SpecialName | MethodAttributes.RTSpecialName | MethodAttributes.HideBySig,
       CallingConventions.Standard, Type.EmptyTypes);
    ConstructorInfo superConstructor = typeof(AbstrTest).GetConstructor(
       BindingFlags.NonPublic | BindingFlags.Public | BindingFlags.Instance,
       null, Type.EmptyTypes, null);
    ILGenerator constructorIL = constructor.GetILGenerator();
    constructorIL.Emit(OpCodes.Ldarg_0);
    constructorIL.Emit(OpCodes.Call, superConstructor);
    constructorIL.Emit(OpCodes.Ldc_I4_2);
    constructorIL.Emit(OpCodes.Conv_U8);
    constructorIL.Emit(OpCodes.Stsfld, myFieldBuilder);
    constructorIL.Emit(OpCodes.Ret);

    // Create the MyMethod method.
    MethodBuilder myMethodBuilder = myTypeBuilder.DefineMethod("MyMethod",
       MethodAttributes.Public | MethodAttributes.ReuseSlot | MethodAttributes.Virtual | MethodAttributes.HideBySig,
       typeof(ulong), new Type [] { typeof(ulong) });
    ILGenerator methodIL = myMethodBuilder.GetILGenerator();
    methodIL.Emit(OpCodes.Ldarg_1);
    methodIL.Emit(OpCodes.Ldsfld, myFieldBuilder);
    methodIL.Emit(OpCodes.Mul);
    methodIL.Emit(OpCodes.Ret);

    return myTypeBuilder.CreateType();
  }

  public static void Main() {
    ulong accu;
    Stopwatch stopwatch;
    try {
      Console.WriteLine("Testing non-collectible const multiply:");
      AbstrTest i0 = (AbstrTest)Activator.CreateInstance(
        CreateTypeConst("MyClassModule0", false));
      stopwatch = Stopwatch.StartNew();
      accu = 0;
      for (uint i = 0; i < 0xffffffff; i++)
        accu += i0.MyMethod(i);
      stopwatch.Stop();
      Console.WriteLine("Elapsed: " + stopwatch.Elapsed.TotalMilliseconds + " ms");

      Console.WriteLine("Testing collectible const multiply:");
      AbstrTest i1 = (AbstrTest)Activator.CreateInstance(
        CreateTypeConst("MyClassModule1", true));
      stopwatch = Stopwatch.StartNew();
      accu = 0;
      for (uint i = 0; i < 0xffffffff; i++)
        accu += i1.MyMethod(i);
      stopwatch.Stop();
      Console.WriteLine("Elapsed: " + stopwatch.Elapsed.TotalMilliseconds + " ms");

      Console.WriteLine("Testing non-collectible field multiply:");
      AbstrTest i2 = (AbstrTest)Activator.CreateInstance(
        CreateTypeField("MyClassModule2", false));
      stopwatch = Stopwatch.StartNew();
      accu = 0;
      for (uint i = 0; i < 0xffffffff; i++)
        accu += i2.MyMethod(i);
      stopwatch.Stop();
      Console.WriteLine("Elapsed: " + stopwatch.Elapsed.TotalMilliseconds + " ms");

      Console.WriteLine("Testing collectible field multiply:");
      AbstrTest i3 = (AbstrTest)Activator.CreateInstance(
        CreateTypeField("MyClassModule3", true));
      stopwatch = Stopwatch.StartNew();
      accu = 0;
      for (uint i = 0; i < 0xffffffff; i++)
        accu += i3.MyMethod(i);
      stopwatch.Stop();
      Console.WriteLine("Elapsed: " + stopwatch.Elapsed.TotalMilliseconds + " ms");
    }
    catch (Exception e) {
      Console.WriteLine("Exception Caught " + e.Message);
    }
  }
}
Run Code Online (Sandbox Code Playgroud)

所以我的问题是:为什么它变慢?

Han*_*ant 14

是的,这是静态变量分配方式的一个非常不可避免的结果.我将首先描述如何将"视觉"重新放回到Visual Studio中,当您可以查看抖动生成的机器代码时,您将只能用于诊断这样的性能问题.

对于Reflection.Emit代码来说这很棘手,你不能单步执行委托调用,也没有办法找到代码生成的确切位置.您要做的是调用Debugger.Break(),以便调试器停在正确的位置.所以:

    ILGenerator methodIL = myMethodBuilder.GetILGenerator();
    var brk = typeof(Debugger).GetMethod("Break");
    methodIL.Emit(OpCodes.Call, brk);
    methodIL.Emit(OpCodes.Ldarg_1);
    // etc..
Run Code Online (Sandbox Code Playgroud)

将循环重复更改为1.工具>选项>调试>常规.解开"Just My Code"和"抑制JIT优化".Debug选项卡>勾选"启用本机代码调试".切换到发布版本.我将发布32位代码,因为x64抖动可以做得更好,所以更有趣.

"测试不可收集的字段乘法"测试的机器代码如下所示:

01410E70  push        dword ptr [ebp+0Ch]        ; Ldarg_1, high 32-bits
01410E73  push        dword ptr [ebp+8]          ; Ldarg_1, low 32-bits
01410E76  push        dword ptr ds:[13A6528h]    ; myFieldBuilder, high 32-bits
01410E7C  push        dword ptr ds:[13A6524h]    ; myFieldBuilder, low 32-bits 
01410E82  call        @JIT_LMul@16 (73AE1C20h)   ; 64 bit multiply
Run Code Online (Sandbox Code Playgroud)

没有什么是非常激烈的,它调用CLR辅助方法来执行64位乘法.x64抖动可以通过单个IMUL指令完成.注意访问静态myFieldBuilder变量,它有一个硬编码地址0x13A6524.它会在你的机器上有所不同.这非常有效.

现在令人失望的一个:

059F0480  push        dword ptr [ebp+0Ch]        ; Ldarg_1, high 32-bits
059F0483  push        dword ptr [ebp+8]          ; Ldarg_1, low 32-bits
059F0486  mov         ecx,59FC8A0h               ; arg2 = DynamicClassDomainId
059F048B  xor         edx,edx                    ; arg1 = DomainId
059F048D  call        JIT_GetSharedNonGCStaticBaseDynamicClass (73E0A6C7h)  
059F0492  push        dword ptr [eax+8]          ; @myFieldBuilder, high 32-bits
059F0495  push        dword ptr [eax+4]          ; @myFieldBuilder, low 32-bits
059F0498  call        @JIT_LMul@16 (73AE1C20h)   ; 64-bit multiply
Run Code Online (Sandbox Code Playgroud)

你可以告诉为什么它从半英里外变慢,还有一个额外的调用JIT_GetSharedNonGCStaticBaseDynamicClass.它是CLR中的一个辅助函数,专门用于处理使用AssemblyBuilderAccess.RunAndCollect构建的Reflection.Emit代码中使用的静态变量.你今天可以看到来源,就在这里.让大家的眼睛流血,但它是函数映射一个AppDomain标识和动态类标识符(又名型手柄),以存储静态变量的分配一块内存.

在"不可收集"版本中,抖动知道存储静态变量的特定地址.当它从与AppDomain相关联的称为"加载器堆"的内部结构中搜索代码时,它分配了变量.知道变量的确切地址后,它可以直接在机器代码中发出变量的地址.当然效率很高,没有办法更快地做到这一点.

但是这在"收藏"版本中不起作用,它不仅需要垃圾收集机器代码而且需要静态变量.这只能在动态分配存储时才能工作.所以它可以动态发布.额外的间接,将它与Dictionary相比,是使代码变慢的原因.

您现在可能会理解为什么除非卸载AppDomain,否则无法卸载.NET程序集(和代码).这是一个非常非常重要的性能优化.

不确定您希望获得什么样的建议.一个是自己处理静态变量存储,一个带有实例字段的类.获得这些收集没问题.仍然不会那么快,它需要额外的间接,但绝对比让CLR处理它更快.