Java Math.abs(int)优化,为什么此代码慢6倍?

xte*_*ern 19 java performance jit x86-64

如您所知,Math.abs(Integer.MIN_VALUE) == Integer.MIN_VALUE为了防止出现负值,该safeAbs方法已在我的项目中实现:

    public static int safeAbs(int i) {
        i = Math.abs(i);

        return i < 0 ? 0 : i;
    }
Run Code Online (Sandbox Code Playgroud)

我将性能与以下各项进行了比较:

    public static int safeAbs(int i) {
        return i == Integer.MIN_VALUE ? 0 : Math.abs(i);
    }
Run Code Online (Sandbox Code Playgroud)

并且第一个比第二个慢几乎6倍(第二个性能几乎与“纯” Math.abs(int)相同)。从我的角度来看,字节码没有显着差异,但是我猜想差异存在于JIT“汇编”代码中:

“慢”版本:

  0x00007f0149119720: mov     %eax,0xfffffffffffec000(%rsp)
  0x00007f0149119727: push    %rbp
  0x00007f0149119728: sub     $0x20,%rsp
  0x00007f014911972c: test    %esi,%esi
  0x00007f014911972e: jl      0x7f0149119734
  0x00007f0149119730: mov     %esi,%eax
  0x00007f0149119732: jmp     0x7f014911973c
  0x00007f0149119734: neg     %esi
  0x00007f0149119736: test    %esi,%esi
  0x00007f0149119738: jl      0x7f0149119748
  0x00007f014911973a: mov     %esi,%eax
  0x00007f014911973c: add     $0x20,%rsp
  0x00007f0149119740: pop     %rbp
  0x00007f0149119741: test    %eax,0x1772e8b9(%rip)  ;   {poll_return}
  0x00007f0149119747: retq
  0x00007f0149119748: mov     %esi,(%rsp)
  0x00007f014911974b: mov     $0xffffff65,%esi
  0x00007f0149119750: nop
  0x00007f0149119753: callq   0x7f01490051a0    ; OopMap{off=56}
                                                ;*ifge
                                                ; - math.FastAbs::safeAbsSlow@6 (line 16)
                                                ;   {runtime_call}
  0x00007f0149119758: callq   0x7f015f521d20    ;   {runtime_call}
Run Code Online (Sandbox Code Playgroud)

“普通”版本:

  # {method} {0x00007f31acf28cd8} 'safeAbsFast' '(I)I' in 'math/FastAbs'
  # parm0:    rsi       = int
  #           [sp+0x30]  (sp of caller)
  0x00007f31b08c7360: mov     %eax,0xfffffffffffec000(%rsp)
  0x00007f31b08c7367: push    %rbp
  0x00007f31b08c7368: sub     $0x20,%rsp
  0x00007f31b08c736c: cmp     $0x80000000,%esi
  0x00007f31b08c7372: je      0x7f31b08c738e
  0x00007f31b08c7374: mov     %esi,%r10d
  0x00007f31b08c7377: neg     %r10d
  0x00007f31b08c737a: test    %esi,%esi
  0x00007f31b08c737c: mov     %esi,%eax
  0x00007f31b08c737e: cmovl   %r10d,%eax
  0x00007f31b08c7382: add     $0x20,%rsp
  0x00007f31b08c7386: pop     %rbp
  0x00007f31b08c7387: test    %eax,0x162c2c73(%rip)  ;   {poll_return}
  0x00007f31b08c738d: retq
  0x00007f31b08c738e: mov     %esi,(%rsp)
  0x00007f31b08c7391: mov     $0xffffff65,%esi
  0x00007f31b08c7396: nop
  0x00007f31b08c7397: callq   0x7f31b07b11a0    ; OopMap{off=60}
                                                ;*if_icmpne
                                                ; - math.FastAbs::safeAbsFast@3 (line 17)
                                                ;   {runtime_call}
  0x00007f31b08c739c: callq   0x7f31c5863d20    ;   {runtime_call}
Run Code Online (Sandbox Code Playgroud)

基准代码:

  0x00007f0149119720: mov     %eax,0xfffffffffffec000(%rsp)
  0x00007f0149119727: push    %rbp
  0x00007f0149119728: sub     $0x20,%rsp
  0x00007f014911972c: test    %esi,%esi
  0x00007f014911972e: jl      0x7f0149119734
  0x00007f0149119730: mov     %esi,%eax
  0x00007f0149119732: jmp     0x7f014911973c
  0x00007f0149119734: neg     %esi
  0x00007f0149119736: test    %esi,%esi
  0x00007f0149119738: jl      0x7f0149119748
  0x00007f014911973a: mov     %esi,%eax
  0x00007f014911973c: add     $0x20,%rsp
  0x00007f0149119740: pop     %rbp
  0x00007f0149119741: test    %eax,0x1772e8b9(%rip)  ;   {poll_return}
  0x00007f0149119747: retq
  0x00007f0149119748: mov     %esi,(%rsp)
  0x00007f014911974b: mov     $0xffffff65,%esi
  0x00007f0149119750: nop
  0x00007f0149119753: callq   0x7f01490051a0    ; OopMap{off=56}
                                                ;*ifge
                                                ; - math.FastAbs::safeAbsSlow@6 (line 16)
                                                ;   {runtime_call}
  0x00007f0149119758: callq   0x7f015f521d20    ;   {runtime_call}
Run Code Online (Sandbox Code Playgroud)

结果(Linux x86-64、7820HQ在oracle jdk 8和11上检查,结果非常相似)。

Benchmark                      Mode  Cnt         Score        Error  Units
SafeAbsMicroBench.safeAbsFast  avgt   10   6435155.516 ±  47130.767  ns/op
SafeAbsMicroBench.safeAbsSlow  avgt   10  35646411.744 ± 776173.621  ns/op
Run Code Online (Sandbox Code Playgroud)

有人可以解释为什么第一个代码比第二个要慢得多吗?

Ole*_*hov 6

safeAbsSlowsafeAbsFast方法所生成的本机代码有所不同。

safeAbsSlow (C2,第4级):

0x0000023d12ec4b14: add     eax,ecx
0x0000023d12ec4b16: inc     ebx

0x0000023d12ec4b18: cmp     ebx,989680h
0x0000023d12ec4b1e: jnl     23d12ec4b4eh ; jump if `ebx` was not less than `10_000_000`

0x0000023d12ec4b20: mov     ecx,dword ptr [r9+rbx*4+10h]

0x0000023d12ec4b25: test    ecx,ecx
0x0000023d12ec4b27: jnl     23d12ec4b14h ; jump if `ecx` was not less-than `0`

0x0000023d12ec4b29: neg     ecx

0x0000023d12ec4b2b: test    ecx,ecx
0x0000023d12ec4b2d: jnl     23d12ec4b14h ; jump if `ecx` was not less-than `0`
Run Code Online (Sandbox Code Playgroud)

safeAbsFast (C2,第4级):

0x000001d89e8a4b20: mov     ecx,dword ptr [r9+rdi*4+10h]

0x000001d89e8a4b25: cmp     ecx,80000000h
0x000001d89e8a4b2b: je      1d89e8a4b66h ; jump if `ecx` was equal to `2147483648`

0x000001d89e8a4b2d: mov     r11d,ecx
0x000001d89e8a4b30: neg     r11d
0x000001d89e8a4b33: test    ecx,ecx
0x000001d89e8a4b35: cmovl   ecx,r11d

0x000001d89e8a4b39: add     eax,ecx
0x000001d89e8a4b3b: inc     edi

0x000001d89e8a4b3d: cmp     edi,989680h
0x000001d89e8a4b43: jl      1d89e8a4b20h ; jump if `edi` was less than `10_000_000`
Run Code Online (Sandbox Code Playgroud)

从上面我们可以看到,safeAbsSlow条件跳转比更大safeAbsFast

尤其是因为Math.abs内联到中的实现safeAbsFast没有条件跳转:

0x000001d89e8a4b2d: mov     r11d,ecx
0x000001d89e8a4b30: neg     r11d
0x000001d89e8a4b33: test    ecx,ecx
0x000001d89e8a4b35: cmovl   ecx,r11d
Run Code Online (Sandbox Code Playgroud)

结果,与数据集同时具有正值和负值且分散在整个数组中的slow版本相比,该   normal版本中的分支丢失更多。以下是使用perfLinux探查器收集的相应统计信息:

Benchmark                          Mode  Cnt          Score         Error  Units
safeAbsFast                        avgt   10    9611659.726 ± 1429082.431  ns/op
safeAbsFast:branch-misses          avgt            2869.853                 #/op
safeAbsFast:branches               avgt        12492918.020                 #/op
safeAbsFast:cycles                 avgt        28212203.936                 #/op
safeAbsFast:instructions           avgt        92352048.153                 #/op
safeAbsSlow                        avgt   10   44524180.366 ± 6324887.086  ns/op
safeAbsSlow:branch-misses          avgt         5006493.144                 #/op
safeAbsSlow:branches               avgt        17496069.911                 #/op
safeAbsSlow:cycles                 avgt       126413171.674                 #/op
safeAbsSlow:instructions           avgt        67549877.558                 #/op
Run Code Online (Sandbox Code Playgroud)

相反,这是排序后的数据集的结果:

Benchmark                          Mode  Cnt         Score         Error  Units
safeAbsFast                        avgt   10   9026800.584 ±  528992.157  ns/op
safeAbsFast:branch-misses          avgt           2785.463                 #/op
safeAbsFast:branches               avgt       12474751.905                 #/op
safeAbsFast:cycles                 avgt       27379727.603                 #/op
safeAbsFast:instructions           avgt       92418075.715                 #/op
safeAbsSlow                        avgt   10   6981828.374 ± 2375480.834  ns/op
safeAbsSlow:branch-misses          avgt           2801.022                 #/op
safeAbsSlow:branches               avgt       17496585.992                 #/op
safeAbsSlow:cycles                 avgt       19478382.113                 #/op
safeAbsSlow:instructions           avgt       67589946.278                 #/op
Run Code Online (Sandbox Code Playgroud)

slow当对数据集进行排序时,以前的版本甚至变得更快(在这种情况下,将代价高昂的分支丢失最小化)。


环境:

openjdk version "12-internal" 2019-03-19
OpenJDK Runtime Environment (slowdebug build 12-internal+0-adhoc.jdk12)
OpenJDK 64-Bit Server VM (slowdebug build 12-internal+0-adhoc.jdk12, mixed mode)
Run Code Online (Sandbox Code Playgroud)