为什么JMH说返回1比返回0更快

Art*_*yan 22 java performance benchmarking jmh

有人可以解释为什么JMH说返回1比返回0快吗?

这是基准代码.

import org.openjdk.jmh.annotations.*;

import java.util.concurrent.TimeUnit;

@State(Scope.Thread)
@BenchmarkMode(Mode.Throughput)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Fork(value = 3, jvmArgsAppend = {"-server", "-disablesystemassertions"})
public class ZeroVsOneBenchmark {

    @Benchmark
    @Warmup(iterations = 3, time = 2, timeUnit = TimeUnit.SECONDS)
    public int zero() {
        return 0;
    }

    @Benchmark
    @Warmup(iterations = 3, time = 2, timeUnit = TimeUnit.SECONDS)
    public int one() {
        return 1;
    }
}
Run Code Online (Sandbox Code Playgroud)

结果如下:

# Run complete. Total time: 00:03:05

Benchmark                       Mode   Samples        Score  Score error    Units
c.m.ZeroVsOneBenchmark.one     thrpt        60  1680674.502    24113.014   ops/ms
c.m.ZeroVsOneBenchmark.zero    thrpt        60   735975.568    14779.380   ops/ms
Run Code Online (Sandbox Code Playgroud)

一,二和零的相同行为

# Run complete. Total time: 01:01:56

Benchmark                       Mode   Samples        Score  Score error    Units
c.m.ZeroVsOneBenchmark.one     thrpt        90  1762956.470     7554.807   ops/ms
c.m.ZeroVsOneBenchmark.two     thrpt        90  1764642.299     9277.673   ops/ms
c.m.ZeroVsOneBenchmark.zero    thrpt        90   773010.467     5031.920   ops/ms
Run Code Online (Sandbox Code Playgroud)

apa*_*gin 34

JMH是一个很好的工具但仍然不完美.

当然,返回0,1或任何其他整数之间没有速度差异.但是,它对JMH 消耗的值以及HotSpot JIT如何编译它有所不同.

为了防止JIT优化计算,JMH使用特殊的Blackhole类来消耗从基准返回的值.这是一个整数值:

public final void consume(int i) {
    if (i == i1 & i == i2) {
        // SHOULD NEVER HAPPEN
        nullBait.i1 = i; // implicit null pointer exception
    }
}
Run Code Online (Sandbox Code Playgroud)

i是从基准返回的值.在你的情况下,它是0或1.当i == 1永不发生的条件看起来像if (1 == i1 & 1 == i2)编译如下:

0x0000000002b4ffe5: mov    0xb0(%r13),%r10d   ;*getfield i1
0x0000000002b4ffec: mov    0xb4(%r13),%r8d    ;*getfield i2
0x0000000002b4fff3: cmp    $0x1,%r8d
0x0000000002b4fff7: je     0x0000000002b50091  ;*return
Run Code Online (Sandbox Code Playgroud)

但是当i == 0JIT试图"优化"两个比较0使用setne指令时.但是结果代码变得太复杂了:

0x0000000002a40b28: mov    0xb0(%rdi),%r10d   ;*getfield i1
0x0000000002a40b2f: mov    0xb4(%rdi),%r8d    ;*getfield i2
0x0000000002a40b36: test   %r10d,%r10d
0x0000000002a40b39: setne  %r10b
0x0000000002a40b3d: movzbl %r10b,%r10d
0x0000000002a40b41: test   %r8d,%r8d
0x0000000002a40b44: setne  %r11b
0x0000000002a40b48: movzbl %r11b,%r11d
0x0000000002a40b4c: xor    $0x1,%r10d
0x0000000002a40b50: xor    $0x1,%r11d
0x0000000002a40b54: and    %r11d,%r10d
0x0000000002a40b57: test   %r10d,%r10d
0x0000000002a40b5a: jne    0x0000000002a40c15  ;*return
Run Code Online (Sandbox Code Playgroud)

也就是说,return 0执行的CPU指令越多,解释越慢Blackhole.consume().

JMH开发人员注意:我建议重写Blackhole.consume一下

if (i == l1) {
     // SHOULD NEVER HAPPEN
    nullBait.i1 = i; // implicit null pointer exception
}
Run Code Online (Sandbox Code Playgroud)

哪里volatile long l1 = Long.MIN_VALUE.在这种情况下,条件仍将始终为false,但对于所有返回值,它将被平均编译.

  • 这个例子中的*真实*外观是nanobenchmarks需要在汇编级别进行验证,现在使用JMH的-prof perfasm方便了:) (12认同)