在 .NET 7 上初始运行时 for 循环的性能问题

jrw*_*jrw 3 c# performance .net-7.0

我正在开发一个性能敏感的应用程序,并考虑从 .NET 6 迁移到 .NET 7。

在比较这两个版本的过程中,我发现 .NET 7 在初始运行时执行 for 循环的速度较慢。

测试是使用两个具有相同代码的独立控制台应用程序完成的,一个在 .NET 6 上,另一个在 .NET 7 上,在任何 CPU 上以发布模式运行。

测试代码:

using System.Diagnostics;

int size = 1000000;
Stopwatch sw = new();

//create array
float[] arr = new float[size];
for (int i = 0; i < size; i++)
    arr[i] = i;

Console.WriteLine(AppDomain.CurrentDomain.SetupInformation.TargetFrameworkName);

Console.WriteLine($"\nForLoop1");
ForLoop1();
ForLoop1();
ForLoop1();
ForLoop1();
ForLoop1();

Console.WriteLine($"\nForLoopArray");
ForLoopArray();
ForLoopArray();
ForLoopArray();
ForLoopArray();
ForLoopArray();

Console.WriteLine($"\nForLoop2");
ForLoop2();
ForLoop2();
ForLoop2();
ForLoop2();
ForLoop2();

void ForLoop1()
{
    sw.Restart();

    int sum = 0;
    for (int i = 0; i < size; i++)
        sum++;

    sw.Stop();
    Console.WriteLine($"{sw.ElapsedTicks} ticks ({sum})");
}

void ForLoopArray()
{
    sw.Restart();

    float sum = 0f;
    for (int i = 0; i < size; i++)
        sum += arr[i];

    sw.Stop();
    Console.WriteLine($"{sw.ElapsedTicks} ticks ({sum})");
}

void ForLoop2()
{
    sw.Restart();

    int sum = 0;
    for (int i = 0; i < size; i++)
        sum++;

    sw.Stop();
    Console.WriteLine($"{sw.ElapsedTicks} ticks ({sum})");
}
Run Code Online (Sandbox Code Playgroud)

.NET 6 版本的控制台输出:

using System.Diagnostics;

int size = 1000000;
Stopwatch sw = new();

//create array
float[] arr = new float[size];
for (int i = 0; i < size; i++)
    arr[i] = i;

Console.WriteLine(AppDomain.CurrentDomain.SetupInformation.TargetFrameworkName);

Console.WriteLine($"\nForLoop1");
ForLoop1();
ForLoop1();
ForLoop1();
ForLoop1();
ForLoop1();

Console.WriteLine($"\nForLoopArray");
ForLoopArray();
ForLoopArray();
ForLoopArray();
ForLoopArray();
ForLoopArray();

Console.WriteLine($"\nForLoop2");
ForLoop2();
ForLoop2();
ForLoop2();
ForLoop2();
ForLoop2();

void ForLoop1()
{
    sw.Restart();

    int sum = 0;
    for (int i = 0; i < size; i++)
        sum++;

    sw.Stop();
    Console.WriteLine($"{sw.ElapsedTicks} ticks ({sum})");
}

void ForLoopArray()
{
    sw.Restart();

    float sum = 0f;
    for (int i = 0; i < size; i++)
        sum += arr[i];

    sw.Stop();
    Console.WriteLine($"{sw.ElapsedTicks} ticks ({sum})");
}

void ForLoop2()
{
    sw.Restart();

    int sum = 0;
    for (int i = 0; i < size; i++)
        sum++;

    sw.Stop();
    Console.WriteLine($"{sw.ElapsedTicks} ticks ({sum})");
}
Run Code Online (Sandbox Code Playgroud)

.NET 7 版本:

.NETCoreApp,Version=v6.0

ForLoop1
2989 ticks (1000000)
2846 ticks (1000000)
2851 ticks (1000000)
3180 ticks (1000000)
2841 ticks (1000000)

ForLoopArray
8270 ticks (4.9994036E+11)
8443 ticks (4.9994036E+11)
8354 ticks (4.9994036E+11)
8952 ticks (4.9994036E+11)
8458 ticks (4.9994036E+11)

ForLoop2
2842 ticks (1000000)
2844 ticks (1000000)
3117 ticks (1000000)
2835 ticks (1000000)
2992 ticks (1000000)
Run Code Online (Sandbox Code Playgroud)

如您所见,.NET 6 计时非常相似,而 .NET 7 计时显示初始高值(19658、20041 和 14016)。

摆弄环境变量 DOTNET_ReadyToRun 和 DOTNET_TieredPGO 只会让事情变得更糟。

这是为什么?如何纠正?

Gur*_*ron 5

我的猜测是,这可以连接到.NET 7 中引入的 新堆栈替换DOTNET_JitDisasmSummary功能。启用“在我的机器上”(Windows Powershell - $env:DOTNET_JitDisasmSummary=1) 会产生以下输出:

ForLoop1
   9: JIT compiled Program:<<Main>$>g__ForLoop1|0_0(byref) [Tier0, IL size=118, code size=291]
  10: JIT compiled Program:<<Main>$>g__ForLoop1|0_0(byref) [Tier1-OSR @0x19, IL size=118, code size=571]
13420 ticks (1000000)
2431 ticks (1000000)
...

ForLoopArray
  11: JIT compiled Program:<<Main>$>g__ForLoopArray|0_1(byref) [Tier0, IL size=129, code size=339]
  12: JIT compiled Program:<<Main>$>g__ForLoopArray|0_1(byref) [Tier1-OSR @0x24, IL size=129, code size=609]
  13: JIT compiled System.SpanHelpers:SequenceCompareTo(byref,int,byref,int) [Tier1, IL size=632, code size=329]
19380 ticks (4.9994036E+11)
10694 ticks (4.9994036E+11)
...

ForLoop2
  14: JIT compiled Program:<<Main>$>g__ForLoop2|0_2(byref) [Tier0, IL size=118, code size=291]
  15: JIT compiled Program:<<Main>$>g__ForLoop2|0_2(byref) [Tier1-OSR @0x19, IL size=118, code size=549]
11720 ticks (1000000)
2549 ticks (1000000)
...
Run Code Online (Sandbox Code Playgroud)

设置DOTNET_TC_QuickJitForLoops为 0 ( env:DOTNET_TC_QuickJitForLoops=0) 会“恢复”此行为(不知道为什么,因为文档声明默认值为false,也许 .NET 7 中发生了某些更改):

ForLoop1
   9: JIT compiled Program:<<Main>$>g__ForLoop1|0_0(byref) [Tier0, IL size=118, code size=291]
  10: JIT compiled Program:<<Main>$>g__ForLoop1|0_0(byref) [Tier1-OSR @0x19, IL size=118, code size=571]
13420 ticks (1000000)
2431 ticks (1000000)
...

ForLoopArray
  11: JIT compiled Program:<<Main>$>g__ForLoopArray|0_1(byref) [Tier0, IL size=129, code size=339]
  12: JIT compiled Program:<<Main>$>g__ForLoopArray|0_1(byref) [Tier1-OSR @0x24, IL size=129, code size=609]
  13: JIT compiled System.SpanHelpers:SequenceCompareTo(byref,int,byref,int) [Tier1, IL size=632, code size=329]
19380 ticks (4.9994036E+11)
10694 ticks (4.9994036E+11)
...

ForLoop2
  14: JIT compiled Program:<<Main>$>g__ForLoop2|0_2(byref) [Tier0, IL size=118, code size=291]
  15: JIT compiled Program:<<Main>$>g__ForLoop2|0_2(byref) [Tier1-OSR @0x19, IL size=118, code size=549]
11720 ticks (1000000)
2549 ticks (1000000)
...
Run Code Online (Sandbox Code Playgroud)

github上可能有相关讨论

聚苯乙烯

如果您的代码对性能敏感,尤其是对启动性能敏感,则可能值得考虑研究Native AOT